file descriptor in kernel space

file descriptor in kernel space - linux-kernel

I'm developing a charachter device driver for Linux.
I want to implement file-descriptor-targeted read() operation which will be a bit specific every time you open a device.
It is possible to identify the process where read() called from (using kernel current macro), but there can be several file descriptor associated with my device in this process.
I know that file descriptors got mapped to struct file objects just before making system call but can I get it back?

welcome to stackoverflow!
To achieve the goal you have specified in comment there are two methods:
ioctl and read :
Here you will have multiple buffers for each consumer to read from, and write buffer is different from read buffer. Each consumer immediatly after opening the device will fire an ioctl which will result in new buffer being allocated and a new token being generated for that buffer (something like this token numeber means this buffer). this token number should be passed back to the concernted consumer.
Now each consumer before making a read call will fire the ioctl giving the token number that will switch the current read buffer to that associated with that token number.
Now this method adds over head and you need to add locks too. Also no more than one consumer at a time can read from the device.
ioctl and mmap:
you can mmap the read buffer for each consumer and let it read from it at its own pace, using ioctl to request new data etc.
This will allow multiple consumers to read at the same time.
Or, you can malloc a new data buffer to read from on each open call and store the pointer to buffer in the private field of the file structure.
when ever a read is called this way you can just read the private data field of the file structure passed with the call and see which buffer is being talked about.
Also you can embed the whole structure containing the buffer pointer and size etc in the private field.

Related

Linux Driver and API architecture for a data acquisition device

We're trying to write a driver/API for a custom data acquisition device, which captures several "channels" of data. For the sake of discussion, let's assume this is a several-channel video capture device. The device is connected to the system via an 8xPCIe Gen-1 link, which has a theoretical throughput of 16Gbps. Our actual data rate will be around 2.8Gbps (~350MB/sec).
Because of the data rate requirement, we think we have to be careful about the driver/API architecture. We've already implemented a descriptor based DMA mechanism and the associated driver. For example, we can start a DMA transaction for 256KB from the device and it completes successfully. However, in this implementation we're only capturing the data in the kernel driver, and then dropping it and we aren't streaming the data to the user-space at all. Essentially, this is just a small DMA test implementation.
We think we have to separate the problem into three sections: 1. Kernel driver 2. Userspace API 3. User Code
The acquisition device has a register in the PCIe address space which indicates whether there is data to read for any channel from the device. So, our kernel driver must poll for this bit-vector. When the kernel driver sees this bit set, it starts a DMA transaction. The user application however does not need to know about all these DMA transactions and data, until an entire chunk of data is ready (For example, assume that the device provides us with 16 lines of video data per transaction, but we need to notify the user only when the entire video frame is ready). We need to only transfer entire frames to the user application.
Here was our first attempt:
Our user-side API allows a user application to register a function callback for a "channel".
The user-side API has a "start" function, which can be called by the user application, which uses ioctl to send a start message to the kernel driver.
In the kernel driver, upon receiving the start message, we started a kernel thread, which continuously monitors the "data ready" bit-vector, and when it sees new data, copies it over to a driver-allocated (kmalloc) buffer. It keeps doing this until the size of the collected data reaches the "frame size".
At this point a custom linux SIGNAL (similar to SIGINT, SIGHUP, etc) is sent to the process which is running the driver. Our API catches this signal and then calls back the appropriate user callback function.
The user callback function calls a function in the API (transfer_data), which uses an ioctl call to send a userspace buffer address to the kernel, and the kernel completes the data transfer by doing a copy_to_user of the channel frame data to userspace.
All of the above is working OK, except that the performance is abysmal. We can only achieve about 2MB/sec of transfer rate. We need to completely re-write this and we're open to any suggestions or pointers to examples.
Other notes:
Unfortunately, we can not change anything in the hardware device. So we must poll for the "data-ready" bit and start DMA based on that bit.
Some people suggested to look at Infiniband drivers as a reference, but we're completely lost in that code.

You're probably way past this now, but if not here's my 2p.
It's hard to believe that your card can't generate interrupts when
it has transferred data. It's got a DMA engine, and it can handle
'descriptors', which are presumably elements of a scatter-gather
list. I'll assume that it can generate a PCIe 'interrupt'; YMMV.
Don't bother trawling the kernel for existing similar drivers. You
might get lucky, but I suspect not.
You need to write a blocking read, which you supply a large memory buffer to. The driver read op (a) gets gets a list of user pages for your user buffer and locks them in memory (get_user_pages); (b) creates a scatter list with pci_map_sg; (c) iterates through the list (for_each_sg); (d) for each entry writes the corresponding physical bus address and data length to the DMA controller as what I presume you're calling a 'descriptor'.
The card now has a list of descriptors which correspond to the physical bus addresses of your large user buffer. When data arrives at the card, it writes it directly into user space, into your user buffer, while your user-level read is still blocked. When it has finished the descriptor list, the card has to be able to interrupt, or it's useless. The driver responds to the interrupt and unblocks your user-level read.
And that's it. The details are nasty, of course, and poorly documented, but that should be the basic architecture. If you really haven't got interrupts you can set up a timer in the kernel to poll for completion of transfer, but if it is really a custom card you should get your money back.

Put data back in socket buffer

Short question, didn't seem to find anything useful here or on Google: in the Winsock2 API, is it possible to put data back in the sockets internal buffer when you have retrieved it using recv() for example, so that is seems it was never actually read from the buffer?

No, it is not possible to inject data back into the socket's internal buffer. Either use the MSG_PEEK flag to read data without removing it from the socket's buffer, or else read the socket data into your own buffer, and then do whatever you want with your buffer. You could have your reading I/O logic always look for data in your buffer first, and then read more data from the socket only when your buffer does not have enough data to satisfy the read operation. Any data you inject back into your buffer will be seen by subsequent read operations.

You can use the MSG_PEEK flag in your recv() call

Where do the contents of the charcter device read parmeters come from?

I have read that, the read function of a character device driver looks like
static ssize_t device_read(struct file *filp, /* see include/linux/fs.h */
char *buffer, /* buffer to fill with data */
size_t length, /* length of the buffer */
loff_t * offset)
My questions are
These parameters are mandatory?
Couldn't see *filp and *offset used in the sample driver. what is the use of that ?
Where do the data for *buffer and *length actually come from? In the code it is said that buffer is in the user data segment. What does it mean actually?

These parameters are mandatory?
No, these parameters are not mandatory. It all depends on how you want to implement your read operation. But yes, user space application has to pass everything whichever is required in read system call and then its up to driver that what driver wants to use.
Couldn't see *filp and *offset used in the sample driver. what is the use of that ?
That is because sample driver is not reading the actual device, it just reads the global char string. But in actual driver it reads some device. To inform driver on which device user space wants to read, *filp is used as a device identifier. Offset just gives position from where start to read on device.
Where do the data for *buffer and *length actually come from? In the code it is said that buffer is in the user data segment. What does it mean actually?
In actual scenario, data is read from device indicated by filp and that data goes to buffer and length is set accordingly. But in sample driver, instead of reading a device, it is just reading global char string for sake of simplicity. This *buffer is in user data segment, meaning user space application has allocated that buffer in its own data segment and it has passed its pointer to kernel space so kernel can pass data to user space application which driver has read from a device. put_user is used for appropriate transfer of data to user space buffer.

Lets say a user process wants to read some data from a file using the read system call. The user process provides a file descriptor, a buffer where the data should be read into, and the number of bytes to read.
The file descriptor of the read call gets translated to a struct file * by the kernel. The buffer and length arguments are the buffer and byte-count provided by the user process.

Doing a zero-copy move of data from a Linux kernel buffer to hard disk

am trying to move data from a buffer in kernel space into the hard
disk without having to incur any additional copies from kernel buffer to
user buffers or any other kernel buffers. Any ideas/suggestions would be
most helpful.
The use case is basically a demux driver which collects data into a
demux buffer in kernel space and this buffer has to be emptied
periodically by copying the contents into a FUSE-based partition on the
disk. As the buffer gets full, a user process is signalled which then
determines the sector numbers on the disk the contents need to be copied
to.
I was hoping to mmap the above demux kernel buffer into user address
space and issue a write system call to the raw partition device. But
from what I can see, the this data is being cached by the kernel on its
way to the Hard Disk driver. And so I am assuming that involves
additional copies by the linux kernel.
At this point I am wondering if there is any other mechansim to do this
without involving additional copies by the kernel. I realize this is an
unsual usage scenario for non-embedded environments, but I would
appreciate any feedback on possible options.
BTW - I have tried using O_DIRECT when opening the raw partition, but
the subsequent write call fails if the buffer being passed is the
mmapped buffer.
Thanx!

You need to expose your demux buffer as a file descriptor (presumably, if you're using mmap() then you're already doing this - great!).
On the kernel side, you then need to implement the splice_read member of struct file_operations.
On the userspace side, create a pipe(), then use splice() twice - once to move the data from the demux file descriptor into the pipe, and a second time to move the data from the pipe to the disk file. Use the SPLICE_F_MOVE flag.
As documented in the splice() man page, it will avoid actual copies where it can, by copying references to pages of kernel memory rather than the pages themselves.

Handling streamed data via pipes

A Win32 application (the "server") is sending a continuous stream of data over a named pipe. GetNamedPipeInfo() tells me that input and output buffer sizes are automatically allocated as needed. The pipe is operating in byte mode (although it is sending data units that are bigger than 1 byte (doubles, to be precise)).
Now, my question is this: Can I somehow verify that my application (the "client") is not missing any data when reading from the pipe? I know that those read/write operations are buffered, but I suppose the buffers will not grow indefinitely if the client doesn't fetch the data quickly enough. How do I know if I missed something? Does the server (or the pipe?) silently discard data that is not read in time by the client?
BTW, can I rely on proper alignment of the data the client reads using ReadFile()? As far as I understood, ReadFile() may return with less bytes read than specified, i.e. NumberOfBytesRead <= NumberOfBytesToRead. Do I have to check every time that NumberOfBytesRead is a multiple of sizeof(double)?

The write operation will block if there is no more room in the pipe's buffers. This is from my (old) copy of the SDK manual:
When an application uses the WriteFile
function to write to a pipe, the write
operation may not finish if the pipe
buffer is full. The write operation is
completed when a read operation (using
the ReadFile function) makes more
buffer space available.

Sorry, didn't find out how to comment on your post, Neil.
The write operation will block if there is no more room in the pipe's buffers.
I just discovered that Sysinternals' FileMon can also monitor pipe operations. For testing purposes I connected the client to the named pipe and did no read operations, just waiting. The server writes a few hundred kB to the pipe every 4--5 seconds, even though nobody is fetching the data from the pipe on the client side. No blocking write operation ... And so far no limits in buffer-size seem to have been reached.
This is either a very big buffer ... or the server does some magic additional to just using WriteFile() and waiting for the client to read.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio