Why does epoll_ctl need the filedescriptor twice? - file-descriptor

In the example:
event.events = EPOLLIN;
event.data.fd = fd;
int ret = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, event.data.fd, &event);
I pass the file descriptor in as both a member of event.data and as an argument in its own right.
What does epoll_ctl need the file descriptor twice?

This is a duplicate of about epoll_ctl()
The reason it needs it twice is that data inside event is a union. epoll_ctl does not know whether you actually provided a file descriptor or something else.
typedef union epoll_data {
void *ptr;
int fd;
uint32_t u32;
uint64_t u64;
} epoll_data_t;

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)
Where:
epfd is the file descriptor returned by epoll_create which identifies the epoll instance in the kernel.
fd is the file descriptor we want to add to the epoll list/interest list.
op refers to the operation to be performed on the file descriptor fd. In general, three operations are supported:
Register fd with the epoll instance (EPOLL_CTL_ADD) and get notified about events that occur on fd
Delete/deregister fd from the epoll instance. This would mean that the process would no longer get any notifications about events on that file descriptor (EPOLL_CTL_DEL). If a file descriptor has been added to multiple epoll instances, then closing it will remove it from all of the epoll interest lists to which it was added.
Modify the events fd is monitoring (EPOLL_CTL_MOD)
event is a pointer to a structure called epoll_event which stores the event we actually want to monitor fd for.
The first field events of the epoll_event structure is a bitmask that indicates which events fd is being monitored for.
Like so, if fd is a socket, we might want to monitor it for the arrival of new data on the socket buffer (EPOLLIN). We might also want to monitor fd for edge-triggered notifications which is done by OR-ing EPOLLET with EPOLLIN. We might also want to monitor fd for the occurrence of a registered event but only once and stop monitoring fd for subsequent occurrences of that event. This can be accomplished by OR-ing the other flags (EPOLLET, EPOLLIN) we want to set for descriptor fd with the flag for only-once notification delivery EPOLLONESHOT. All possible flags can be found in the man page.
The second field of the epoll_event struct is a union field.
Source
Added some extra data apart from just what was asked for context.
Hope this helps!

Related

file descriptor in kernel space

I'm developing a charachter device driver for Linux.
I want to implement file-descriptor-targeted read() operation which will be a bit specific every time you open a device.
It is possible to identify the process where read() called from (using kernel current macro), but there can be several file descriptor associated with my device in this process.
I know that file descriptors got mapped to struct file objects just before making system call but can I get it back?
welcome to stackoverflow!
To achieve the goal you have specified in comment there are two methods:
ioctl and read :
Here you will have multiple buffers for each consumer to read from, and write buffer is different from read buffer. Each consumer immediatly after opening the device will fire an ioctl which will result in new buffer being allocated and a new token being generated for that buffer (something like this token numeber means this buffer). this token number should be passed back to the concernted consumer.
Now each consumer before making a read call will fire the ioctl giving the token number that will switch the current read buffer to that associated with that token number.
Now this method adds over head and you need to add locks too. Also no more than one consumer at a time can read from the device.
ioctl and mmap:
you can mmap the read buffer for each consumer and let it read from it at its own pace, using ioctl to request new data etc.
This will allow multiple consumers to read at the same time.
Or, you can malloc a new data buffer to read from on each open call and store the pointer to buffer in the private field of the file structure.
when ever a read is called this way you can just read the private data field of the file structure passed with the call and see which buffer is being talked about.
Also you can embed the whole structure containing the buffer pointer and size etc in the private field.

Is it possible to open a file allowing another processes to delete this file?

It seems the default os.Open call allows another processes to write the opened file, but not to delete it. Is it possible to enable deletion as well? In .NET this can be done using FileShare.Delete flag, is there any analog in Go?
os.Open will get you a file descriptor with flag O_RDONLY set; that means read only. You can specify your own flag by using os.OpenFile
O_RDONLY int = syscall.O_RDONLY // open the file read-only.
O_WRONLY int = syscall.O_WRONLY // open the file write-only.
O_RDWR int = syscall.O_RDWR // open the file read-write.
O_APPEND int = syscall.O_APPEND // append data to the file when writing.
O_CREATE int = syscall.O_CREAT // create a new file if none exists.
O_EXCL int = syscall.O_EXCL // used with O_CREATE, file must not exist
O_SYNC int = syscall.O_SYNC // open for synchronous I/O.
O_TRUNC int = syscall.O_TRUNC // if possible, truncate file when opened.
None of these modes, however, will allow you to have multiple writers on a single file. You can share the file descriptor by exec-ing or fork-ing but naively writing to the file from both processes will result in the OS deciding how to synchronise those writes -- which is almost never what you want.
Deleting a file while a process has a FD on it doesn't matter on unix-like systems. I'll go ahead and assume Windows won't like that, though.
Edit given the windows tag and #Not_a_Golfer's excellent observations:
You should be able to pass syscall.FILE_SHARE_DELETE as a flag to os.OpenFile on Windows, if that is what solves your problem.
If you need to combine several flags you can do so by or-ing them together:
syscall.FILE_SHARE_DELETE | syscall.SOME_OTHER_FLAG | syscall.AND_A_THIRD_FLAG
(note, however, that it's up to you to build a coherent flag)

What does actually cdev_add() do? in terms of registering a device to the kernel

What does cdev_add() actually do? I'm asking terms of registering a device with the kernel.
Does it add the pointer to cdev structure in some map which is indexed by major and minor number? How exactly does this happen when you say the device is added/registered with the kernel. I want to know what steps the cdev_add takes to register the device in the running kernel. We create a node for user-space using mknod command. Even this command is mapped using major and minor number. Does registration also do something similar?
cdev_add registers a character device with the kernel. The kernel maintains a list of character devices under cdev_map
static struct kobj_map *cdev_map;
kobj_map is basically an array of probes, which in this case is the list of character devices:
struct kobj_map {
struct probe {
struct probe *next;
dev_t dev;
unsigned long range;
struct module *owner;
kobj_probe_t *get;
int (*lock)(dev_t, void *);
void *data;
} *probes[255];
struct mutex *lock;
};
You can see that each entry in the list has the major and minor number for the device (dev_t dev), and the device structure (in the form of kobj_probe_t, which is a kernel object, which represents a cdev in this case). cdev_add adds your character device to the probes list:
int cdev_add(struct cdev *p, dev_t dev, unsigned count)
{
...
error = kobj_map(cdev_map, dev, count, NULL,
exact_match, exact_lock, p);
When you do an open on a device from a process, the kernel finds the inode associated to the filename of your device (via namei function). The inode has the major a minor number for the device (dev_t i_rdev), and flags (imode) indicating that it is a special (character) device. With this it can access the cdev list I explained above, and get the cdev structure instantiated for your device. From there it can create a struct file with the file operations to your cdev, and install a file descriptor in the process's file descriptor table.
This is what actually 'registering' a character device means and why it needs to be done. Registering a block device is similar. The kernel maintains another list for registered gendisks.
You can read Linux Device Driver. It is a little bit old, but the main ideas are the same. It is difficoult to explain a simple operation like cdev_add() and all the stuff around in few lines.
I suggest you to read the book and the source code. If you have trouble to navigate your source code, you can use some tag system like etags + emacs, or the eclipse indexer.
Please see the code comments here:
cdev_add() - add a char device to the system 464 *
#p: the cdev structure for the device 465 * #dev: the first device
number for which this device is responsible 466 * #count: the number
of consecutive minor numbers corresponding to this 467 *
device 468 * 469 * cdev_add() adds the device represented by #p to
the system, making it 470 * live immediately. A negative error code
is returned on failure. 471 */ `
the immediate answer to any such question is read the code. Thats what Linus say.
[edit]
the cdev_add basically adds the device to the system. What it means essentially is that after the cdev_add operation your new device will get visibility through the /sys/ file system. The function does all the necessary house keeping activities related to that particularly the kobj reference to your device will get inserted at its position in the object hierarchy. If you want to get more information about it, I would suggest some reading around /sysfs/ and struct kboj

Should open method in Linux device driver return a file descriptor?

I'm studying Linux Device Driver programming 3rd edition and I have some questions about the open method, here's the "scull_open" method used in that book:
int scull_open(struct inode *inode, struct file *filp){
struct scull_dev *dev; /* device information */
dev = container_of(inode->i_cdev, struct scull_dev, cdev);
filp->private_data = dev; /* for other methods */
/* now trim to 0 the length of the device if open was write-only */
if ( (filp->f_flags & O_ACCMODE) == O_WRONLY) {
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
scull_trim(dev); /* ignore errors */
up(&dev->sem);
}
return 0; /* success */
}
And my questions are:
Shouldn't this function returns a file descriptor to the device just opened?
Isn't the "*filp" local to this function, then why we copy the contents of dev to it?
How we could use later in read and write methods?
could someone writes to my a typical "non-fatty" implementation of open method?
ssize_t scull_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos){
struct scull_dev *dev = filp->private_data;
...}
Userspace open function is what you are thinking of, that is a system call which returns a file descriptor int. Plenty of good references for that, such as APUE 3.3.
Device driver "open method" is a function within file_operations structure. It is different than userspace "file open". With the device driver installed, when user code does open of the device (e.g. accessing /dev/scull0), this "open method" would then get called.
Shouldn't this function returns a file descriptor to the device just opened ?
In linux device driver, open() returns either 0 or negative error code. File descriptor is internally managed by kernel.
Isn't the "*filp" local to this function, then why we copy the contents of dev to it ?
filp represents opened file and a pointer to it is passed to driver by kernel. Sometimes this filp is used sometimes it is not needed by driver. Copy contents of dev is required so that when some other function let us say read() is called driver can retrieve some device specific data.
How we could use later in read and write methods ?
One of the most common way to use filp in read/write() is to acquire lock. When device is opened driver will create a lock. Now when a read/write occurs same lock will be used to prevent data buffer from corruption in case multiple process are accessing same device.
could someone writes to my a typical "non-fatty" implementation of open method ?
As you are just studying please enjoy exploring more. An implementation can be found here

Why CompletionKey in I/O completion port?

Remark from MSDN about CompletionKey in CreateIoCompletionPort function:
Use the CompletionKey parameter to help your application track which
I/O operations have completed. This value is not used by
CreateIoCompletionPort for functional control; rather, it is attached
to the file handle specified in the FileHandle parameter at the time
of association with an I/O completion port. This completion key should
be unique for each file handle, and it accompanies the file handle
throughout the internal completion queuing process. It is returned in
the GetQueuedCompletionStatus function call when a completion packet
arrives. The CompletionKey parameter is also used by the
PostQueuedCompletionStatus function to queue your own special-purpose
completion packets.
The above remarks leave me a question. Why use the CompletionKey given that we can associate user context with the file handle in an extended overlapped structure like this:
typedef struct s_overlappedplus
{
OVERLAPPED ol;
int op_code;
/*we can alternatively put user context over here instead of CompletionKey*/
LPVOID user_context;
} t_overlappedplus;
and retreive through CONTAINING_RECORD macro after completion?
Cool, I'm only convinced that CompletionKey is per-handle context while the extended overlapped structure is per-I/O one. But what's the philosophy behind such design and in what circumstance can it be necessary to use CompletionKey instead of an extended overlapped structure in term of user context?

Resources