How is epoll_wait implemented in Linux for x86_64 - linux-kernel

I would like to know how epoll_wait syscall is implemented in Linux for x86_64. I grepped on the source code and found an entry-point named sys_epoll_wait. However, I couldn't find the implementation of it. Could anyone point to me the correct file I need to look up for this specific syscall. Thank you very much.

It's in fs/eventpoll.c:
SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
int, maxevents, int, timeout)

Related

Regarding msgrcv in android-kernel?

I was running a test suite for testing IPC related functionality in android kernel. while I was testing msgrcv system call , it return error function not implemented.
So is it true msgrcv() system call not implemented in android-kernel, if so why and which system call in android kernel serve purpose of msgrcv() system call.
I got related statement which says System V IPCs (including message queues) are not implemented on Bionic. but not sure what does it mean.
Update : I am able to find definition of msgrcv in android kernel , but not sure why it is returning error function not implemented.
Below code snippet :
SYSCALL_DEFINE5(msgrcv, int, msqid, struct msgbuf __user *, msgp, size_t, msgsz,
long, msgtyp, int, msgflg)
{
return do_msgrcv(msqid, msgp, msgsz, msgtyp, msgflg, do_msg_fill);
}
Please comment if information seems incomplete or vague ,Help is appreciated.
System V IPC may be available in the kernel but system call interfaces are not implemented in Bionic lib C. For Example, /bionic/libc/arch-arm/syscalls/ contains all system call implementations with respect to ARM.

What are parameters of sys_clone() system call and how to hook it in Ubuntu?

I'm using a kernel module to hook system calls like sys_read(), sys_write() and so on. The way I'm hooking it is pretty much like this post.
Now I want to hook sys_clone() in the same way. What I can find from the source code is as follows.
long sys_clone(unsigned long, unsigned long, int __user *, int, int __user*)
First I have no idea of what those parameters mean. I tried to printk them to see. However, even when I tested with a program invoking plenty of clone() system call, I didn't see any printings from my_sys_clone(). Is that because the clone() didn't request for sys_clone() I hooked at all? Or are there any special cases for hooking a sys_clone()?

Linux Syscalls with > 6 parameters

IS it possible to write a (linux kernel)sycall function that has more than 6 input parameters? Looking at the header I see that the defined syscall macros have a maximum of 6 parameters. I'm tempted to try to define SYSCALL7 and SYSCALL8 to allow for 7 and 8 parameters but I'm not quite sure if that will actually work.
For x86, the following function (from x86...syscall.h) copies the arguments over:
static inline void syscall_get_arguments(struct task_struct *task,
struct pt_regs *regs,
unsigned int i, unsigned int n,
unsigned long *args)
{
BUG_ON(i + n > 6);
memcpy(args, &regs->bx + i, n * sizeof(args[0]));
}
This function is described well in the comments in asm_generic/syscall.h. It copies the arguments into the syscall, and there is a limit of 6 arguments. It may be implemented in a number of ways depending on architecture. For x86 (from the snippet above) it looks like the arguments are all passed by register.
So, if you want to pass more than 6 arguments, use a struct. If you must have a SYSCALL7, then you are going to have to create a custom kernel and likely modify almost every step of the syscall process. x86_64 would likely accommodate this change easier, since it has more registers than x86.
What if one day you need 20 parameters ? I think the best way to go around your syscall problem is to use a pointer to *void.
This way you can pass a struct containing an unlimited amount of parameters.
Generally there is no limit to the number of parameter. But all these things need a standard: all kernel module write and user or caller will need to agree on a standard way to pass information from caller to callee (and vice versa) - whether it is passing by stack or register. It is called "ABI" or calling convention. There are different standard for x86 and AMD64, and generally it is the same for all UNIX in x86: Linux, FreeBSD etc.
http://www.x86-64.org/documentation/abi.pdf
Eg, x86 syscall ABI:
http://lwn.net/Articles/456731/
http://esec-lab.sogeti.com/post/2011/07/05/Linux-syscall-ABI
More details please see (to avoid repetition):
What are the calling conventions for UNIX & Linux system calls on x86-64
Why does Windows64 use a different calling convention from all other OSes on x86-64?
And userspace will have its own ABI as well:
https://www.kernel.org/doc/Documentation/ABI/README
https://lwn.net/Articles/234133/
http://lwn.net/Articles/456731/

Which functions are the write/read interface for the linux bsg driver

I am not a driver writer and have a question about what functions are actually called within the bsg driver when one does a write(2)/read(2) from user-land. My CentOS system is using Linux 2.6.32. Surprisingly, though I have the sources for the build used by this CentOS system installed, the bsg.c file isn't there (huh?). So, I downloaded from kernel.org the 2.6.32 sources.
I'm looking in .../linux-2.6.32.61/block/bsg.c. For that source version, my question, is this function (on line 661) called when I call write(2) from user land?
static ssize_t
bsg_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
I'm trying to track down why I'm getting EINVAL when calling write(2) in some cases but not in others when attempting to get SCSI Log Sense data. If I'm on the right track in the driver sources, the only time that EINVAL is returned to the caller is the size of the data being written to the descriptor is not evenly divisible by sizeof(sg_io_v4) (defined in /usr/include/linux/bsg.h).
Andy
Yes it is the right function. In the same file you can find this static const struct file_operations bsg_fops which is the definition of the function to use when userspace does something with the device

Retrieving command line argument of process at driver level

Hello I am writing a minifilter driver for intercepting all the irp packets from a certain process say a.exe .
So , in the driver code it can be done by applying a check on the command line arguments that started the process.
Does anyone know how can i retrieve the command line argument ??
Thanks in advance .
There's no supported way to do this from within kernel-mode. In fact, trying to access user-mode process information from the kernel is a pain in general. I would suggest firing up a request to a user-mode service, which can then find that information and pass it back down to your kernel component.
However, there an undocumented method to do it. If you can get a handle to an EPROCESS struct for the target process, you can get at a pointer to the PEB (process environment block) struct within it, which then has a pointer to an RTL_USER_PROCESS_PARAMETERS structure, which has a member called CommandLine.
Example:
UNICODE_STRING* commandLine = epProcess->Peb->ProcessParameters->CommandLine;
The downside to this is that EPROCESS is almost entirely opaque and PEB is semi-opaque too, meaning that it may change in future versions of Windows. I certainly wouldn't advocate trying this in production code.
Try using the NtQueryInformationProcess or ZwQueryInformationProcess function with the PROCESSINFOCLASS parameter as ProcessBasicInformation. The output parameter, ProcessInformation, will be a struct of type PROCESS_BASIC_INFORMATION. As Polynomial mentioned, this struct has a pointer to the process's PEB struct, which contains the information you are looking for in its ProcessParameters field.

Resources