Netlink: sending from kernel to user - EAGAIN and ENOBUFS - linux-kernel

I'm having a lot of trouble sending netlink messages from kernel module to userspace-daemon. They randomly fail. On the kernel side, the genlmsg_unicast fails with EAGAIN while on the user-side, nl_recvmsgs_default (function from libnl) fails with NLE_NOMEM which is caused by recvmsg syscall failing with ENOBUFS.
Netlink messages are small, maximum payload size is ~300B.
Here is the code for sending message from kernel:
int send_to_daemon(void* msg, int len, int command, int seq, u32 pid) {
struct sk_buff* skb;
void* msg_head;
int res, payload;
payload = GENL_HDRLEN+nla_total_size(len)+36;
skb = genlmsg_new(payload, GFP_KERNEL);
msg_head = genlmsg_put(skb, pid, seq, &psvfs_gnl_family, 0, command);
nla_put(skb, PSVFS_A_MSG, len, msg);
genlmsg_end(skb, msg_head);
genlmsg_unicast(&init_net, skb, pid);
return 0;
}
I absolutely have no idea why this is happening and my project just won't work because of that! I really hope someone could help me with that.

I wonder if you are running on a 64bits machine. If it is the case, I suspect that the use of an int as the type of payload can be the root of some issues as genlmsg_new() expects a size_t which is 64bits on x86_64.
Secondly, I don't think you need to add GENL_HDRLEN to payload as this is taken care of by genlmsg_new() (by using genlmsg_total_size(), which returns genlmsg_msg_size() which finally does the addition). Why this + 36 by the way? Does not look very portable nor explicit on what it is there for.
Hard to tell more without having a look at the rest of the code.

I was having a similar problem receiving ENOBUFS via recvmsg from a netlink socket. I found that my problem was the kernel socket buffer filling before userspace could drain it.
From the netlink(7) man page:
However, reliable transmissions from kernel to user are impossible in
any case. The kernel can't send a netlink message if the socket
buffer is full: the message will be dropped and the kernel and the
user-space process will no longer have the same view of kernel state.
It is up to the application to detect when this happens (via the
ENOBUFS error returned by recvmsg(2)) and resynchronize.
I addressed this problem by increasing the size of the socket receive buffer (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, ...)
, or nl_socket_set_buffer_size() if you are using libnl).

Related

Socket/Interface filter Ioctl callback not getting called

I have a small C program that does the following:
int fd;
fd = socket(AF_INET, SOCK_STREAM, 0);
ioctl(fd, ..., ...);
close(fd);
My kext ioctl callback looks as follows:
errno_t
Router::Ioctl(void *cookie, socket_t so, unsigned long request, const char* argp)
{
IOLog("-- SOCK IOCTL Request is %lu\n", request);
return KERN_SUCCESS;
}
When I open a web page in Safari and stuff, I can see the output in the Console application
default 16:09:08.557920 +0200 kernel -- IOCTL Request is 2147772030
But when I execute my C program, it seems like it 'skips' my kext and nothing is printed.
Ideas on why this happens?
EDIT
Just for the sake of it, I also tried doing the same with an Interface filter - same result
Not a conclusive answer as such, as your sample code is extremely vague and incomplete. A few suggestions and extra information that might help diagnose this:
What does your sflt_register() call look like? Your socket is a datagram (UDP) socket, but Safari is probably using a stream (TCP) socket. Socket filters are for one type of socket.
What ioctl are you performing in your test program? (Are you aware that custom ioctls on sockets are not supported?)
Do you see other (non-ioctl) socket events in your filter?
What were you expecting in the interface filter? Ioctls to a socket don't propagate to the interface.
Finally, be aware that NKEs are effectively deprecated. If you will likely need NKE functionality over the next few macOS releases, you need to get in touch with Apple. (Radar enhancement request and/or DTS)

Successfully de-referenced userspace pointer in kernel space without using copy_from_user()

There's a bug in a driver within our company's codebase that's been there for years.
Basically we make calls to the driver via ioctls. The data passed between userspace and driver space is stored in a struct, and the pointer to the data is fed into the ioctl. The driver is responsible to dereference the pointer by using copy_from_user(). But this code hasn't been doing that for years, instead just dereferencing the userspace pointer. And so far (that I know of) it hasn't caused any issues until now.
I'm wondering how this code hasn't caused any problems for so long? In what case will dereferencing a pointer in kernel space straight from userspace not cause a problem?
In userspace
struct InfoToDriver_t data;
data.cmd = DRV_SET_THE_CLOCK;
data.speed = 1000;
ioctl(driverFd, DEVICE_XX_DRIVER_MODIFY, &data);
In the driver
device_xx_driver_ioctl_handler (struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
struct InfoToDriver_t *user_data;
switch(cmd)
{
case DEVICE_XX_DRIVER_MODIFY:
// what we've been doing for years, BAD
// But somehow never caused a kernel oops until now
user_data = (InfoToDriver_t *)arg;
if (user_data->cmd == DRV_SET_THE_CLOCK)
{ .... }
// what we're supposed to do
copy_from_user(user_data, (void *)arg, sizeof(InfoToDriver_t));
if (user_data->cmd == DRV_SET_THE_CLOCK)
{ ... }
A possible answer is, this depends on the architecture. As you have seen, on a sane architecture (such as x86 or x86-64) simply dereferencing __user pointers just works. But Linux pretends to support every possible architecture, there are architectures where simple dereference does not work. Otherwise copy_to/from_user won't existed.
Another reason for copy_to/from_user is possibility that usermode side modifies its memory simultaneously with the kernel side (in another thread). You cannot assume that the content of usermode memory is frozen while accessing it from kernel. Rougue usermode code can use this to attack the kernel. For example, you can probe the pointer to output data before executing the work, but when you get to copy the result back to usermode, this pointer is already invalid. Oops. The copy_to_user API ensures (should ensure) that the kernel won't crash during the copy, instead the guilty application will be killed.
A safer approach is to copy the whole usermode data structure into kernel (aka 'capture'), check this copy for consistency.
The bottom line... if this driver is proven to work well on certain architecture, and there are no plans to port it, there's no urgency to change it. But check carefully robustness of the kernel code, if capture of the usermode data is needed, or problem may arise during copying from usermode.

Kernel Crashes due to OOM error (USB_SUBMIT_URB)

Scenario :
I am calling usb_submit_urb in ioctl call to send audio packets from the application.
Code implementation is as follows :
retval = copy_from_user(&pkt_1722, pkt_1722_userspace,
sizeof(struct ifr_data_struct_1722));
if(retval) {
printk("copy_from_user error: pkt_1722\n");
retval = -EINVAL;
}
usb_init_urb(bky_urb_alloc[bky_urb_context.bky_i]);
usb_fill_bulk_urb(bky_urb_alloc[bky_urb_context.bky_i],
dev->udev,
priv->avb_a_out,
(void*) dma_buf[bky_urb_context.bky_i],
112,
bky_write_bulk_callback,
&bky_urb_context);
retval = usb_submit_urb(bky_urb_alloc[bky_urb_context.bky_i],
GFP_ATOMIC);
if (retval) {
printk(KERN_INFO "%s - failed submitting write urb, error %d",
__FUNCTION__, retval);
goto error;
}
I am maintaining an array of urbs so that I can reuse them after their completion handlers get called. The allocation of urb and dma_buf takes place once in probe.
Problem :
I am able to stream 1722 packets for few hours and after that kernel crashes and I can only see the black screen with the call traces and says OOM error (Out of memory). The PID that caused the error is some other kernel process running in the background which tries to allocate pages
but it fails and shows OOM and kernel crashes.
May be this problem is due to the external fragmentation that takes place over the period of time.
Any inputs will be of great help.
1) Are the USB urb&s being consumed by something? Whoever has the pointer to the block is responsible for passing it on or freeing the buffer.
2) Have you set the vm.min_free_kbytes in /etc/sysctl.conf to at least 1% of system memory?
3) While the system runs, capture /proc/slabinfo in a shell loop and see if there is a leak somewhere.

Serial communication with minimal delay

I have a computer which is connected with external devices via serial communication (i.e. RS-232/RS-422 of physical or emulated serial ports). They communicate with each other by frequent data exchange (30Hz) but with only small data packet (less than 16 bytes for each packet).
The most critical requirement of the communication is low latency or delay between transmitting and receiving.
The data exchange pattern is handshake-like. One host device initiates communication and keeps sending notification on a client device. A client device needs to reply every notification from the host device as quick as possible (this is exactly where the low latency needs to be achieved). The data packets of notifications and replies are well defined; namely the data length is known.
And basically data loss is not allowed.
I have used following common Win API functions to do the I/O read/write in a synchronous manner:
CreateFile, ReadFile, WriteFile
A client device uses ReadFile to read data from a host device. Once the client reads the complete data packet whose length is known, it uses WriteFile to reply the host device with according data packet. The reads and writes are always sequential without concurrency.
Somehow the communication is not fast enough. Namely the time duration between data sending and receiving takes too long. I guess that it could be a problem with serial port buffering or interrupts.
Here I summarize some possible actions to improve the delay.
Please give me some suggestions and corrections :)
call CreateFile with FILE_FLAG_NO_BUFFERING flag? I am not sure if this flag is relevant in this context.
call FlushFileBuffers after each WriteFile? or any action which can notify/interrupt serial port to immediately transmit data?
set higher priority for thread and process which handling serial communication
set latency timer or transfer size for emulated devices (with their driver). But how about the physical serial port?
any equivalent stuff on Windows like setserial/low_latency under Linux?
disable FIFO?
thanks in advance!
I solved this in my case by setting the comm timeouts to {MAXDWORD,0,0,0,0}.
After years of struggling this, on this very day I finally was able to make my serial comms terminal thingy fast enough with Microsoft's CDC class USB UART driver (USBSER.SYS, which is now built in in Windows 10 making it actually usable).
Apparently the aforementioned set of values is a special value that sets minimal timeouts as well as minimal latency (at least with the Microsoft driver, or so it seems to me anyway) and also causes ReadFile to return immediately if no new characters are in the receive buffer.
Here's my code (Visual C++ 2008, project character set changed from "Unicode" to "Not set" to avoid LPCWSTR type cast problem of portname) to open the port:
static HANDLE port=0;
static COMMTIMEOUTS originalTimeouts;
static bool OpenComPort(char* p,int targetSpeed) { // e.g. OpenComPort ("COM7",115200);
char portname[16];
sprintf(portname,"\\\\.\\%s",p);
port=CreateFile(portname,GENERIC_READ|GENERIC_WRITE,0,0,OPEN_EXISTING,0,0);
if(!port) {
printf("COM port is not valid: %s\n",portname);
return false;
}
if(!GetCommTimeouts(port,&originalTimeouts)) {
printf("Cannot get comm timeouts\n");
return false;
}
COMMTIMEOUTS newTimeouts={MAXDWORD,0,0,0,0};
SetCommTimeouts(port,&newTimeouts);
if(!ComSetParams(port,targetSpeed)) {
SetCommTimeouts(port,&originalTimeouts);
CloseHandle(port);
printf("Failed to set COM parameters\n");
return false;
}
printf("Successfully set COM parameters\n");
return true;
}
static bool ComSetParams(HANDLE port,int baud) {
DCB dcb;
memset(&dcb,0,sizeof(dcb));
dcb.DCBlength=sizeof(dcb);
dcb.BaudRate=baud;
dcb.fBinary=1;
dcb.Parity=NOPARITY;
dcb.StopBits=ONESTOPBIT;
dcb.ByteSize=8;
return SetCommState(port,&dcb)!=0;
}
And here's a USB trace of it working. Please note the OUT transactions (output bytes) followed by IN transactions (input bytes) and then more OUT transactions (output bytes) all within 3 milliseconds:
And finally, since if you are reading this, you might be interested to see my function that sends and receives characters over the UART:
unsigned char outbuf[16384];
unsigned char inbuf[16384];
unsigned char *inLast = inbuf;
unsigned char *inP = inbuf;
unsigned long bytesWritten;
unsigned long bytesReceived;
// Read character from UART and while doing that, send keypresses to UART.
unsigned char vgetc() {
while (inP >= inLast) { //My input buffer is empty, try to read from UART
while (_kbhit()) { //If keyboard input available, send it to UART
outbuf[0] = _getch(); //Get keyboard character
WriteFile(port,outbuf,1,&bytesWritten,NULL); //send keychar to UART
}
ReadFile(port,inbuf,1024,&bytesReceived,NULL);
inP = inbuf;
inLast = &inbuf[bytesReceived];
}
return *inP++;
}
Large transfers are handled elsewhere in code.
On a final note, apparently this is the first fast UART code I've managed to write since abandoning DOS in 1998. O, doest the time fly when thou art having fun.
This is where I found the relevant information: http://www.egmont.com.pl/addi-data/instrukcje/standard_driver.pdf
I have experienced similar problem with serial port.
In my case I resolved the problem decreasing the latency of the serial port.
You can change the latency of every port (which by default is set to 16ms) using control panel.
You can find the method here:
http://www.chipkin.com/reducing-latency-on-com-ports/
Good Luck!!!

ReadFile doesn't work asynchronously on Win7 and Win2k8

According to MSDN, ReadFile can read data 2 different ways: synchronously and asynchronously.
I need the second one. The folowing code demonstrates usage with OVERLAPPED struct:
#include <windows.h>
#include <stdio.h>
#include <time.h>
void Read()
{
HANDLE hFile = CreateFileA("c:\\1.avi", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, NULL);
if ( hFile == INVALID_HANDLE_VALUE )
{
printf("Failed to open the file\n");
return;
}
int dataSize = 256 * 1024 * 1024;
char* data = (char*)malloc(dataSize);
memset(data, 0xFF, dataSize);
OVERLAPPED overlapped;
memset(&overlapped, 0, sizeof(overlapped));
printf("reading: %d\n", time(NULL));
BOOL result = ReadFile(hFile, data, dataSize, NULL, &overlapped);
printf("sent: %d\n", time(NULL));
DWORD bytesRead;
result = GetOverlappedResult(hFile, &overlapped, &bytesRead, TRUE); // wait until completion - returns immediately
printf("done: %d\n", time(NULL));
CloseHandle(hFile);
}
int main()
{
Read();
}
On Windows XP output is:
reading: 1296651896
sent: 1296651896
done: 1296651899
It means that ReadFile didn't block and returned imediatly at the same second, whereas reading process continued for 3 seconds. It is normal async reading.
But on windows 7 and windows 2008 I get following results:
reading: 1296661205
sent: 1296661209
done: 1296661209.
It is a behavior of sync reading.
MSDN says that async ReadFile sometimes can behave as sync (when the file is compressed or encrypted for example). But the return value in this situation should be TRUE and GetLastError() == NO_ERROR.
On Windows 7 I get FALSE and GetLastError() == ERROR_IO_PENDING. So WinApi tells me that it is an async call, but when I look at the test I see that it is not!
I'm not the only one who found this "bug": read the comment on ReadFile MSDN page.
So what's the solution? Does anybody know? It is been 14 months after Denis found this strange behavior.
I don't know the size of the "c:\1.avi" file but the size of the buffer you give to Windows (256M!) is probably big enough to hold the file. So windows decides to read the whole file and put it in the buffer the way it likes. You don't say to windows "I want async", you say "I know how to handle async".
Just change the buffer size say 1024, and your program will behave exactly the same, but read only 1024 bytes (and return ERROR_IO_PENDING as well).
In general, you do asynchronous because you want to do something else during the operation. Look at the sample here: Testing for the End of a File, as it demonstrate an async ReadFile. If you change the sample's buffer and set it to a big value, it should behave exactly like yours.
PS: I suggest you don't rely on time samples to check things, use return codes and events
According to this, I would suspect that it should return TRUE in your case. But it may also be that the completion modes default settings are different on Win7/Win2k8.
Try setting a different mode with SetFileCompletionNotificationModes().
Have you tried to use an event as #Simon Mourier suggested ?. I know that the documentation says that the event is not required, but if you see the example in links provided by #Simon Mourier, it is using an event for asynchronous read.
Windows7/Server2008 have different behavior to resolve a race condition that can occurn in GetOverlappedResultEx. When you compile for these OS's Windows detects this and uses different behavior. I find this wicked confusing.
Here is a link:
http://msdn.microsoft.com/en-us/library/dd371711(VS.85).aspx
I'm sure you've read this many times in the past, but some of the text has changed since Win7 - esp the hEvent field in the OVERLAPPED struct,
http://msdn.microsoft.com/en-us/library/ms684342(v=VS.85).aspx
Functions such as
GetOverlappedResult and the
synchronization wait functions reset
auto-reset events to the nonsignaled
state. Therefore, you should use a
manual reset event; if you use an
auto-reset event, your application can
stop responding if you wait for the
operation to complete and then call
GetOverlappedResult with the bWait
parameter set to TRUE.
could you do an experiment - please allocate a manual reset event in your OVERLAPPED struct instead of a auto reset event? (I dont see the allocation in your snippit - dont forget to create the event and to set 'hEvent' after zeroing the struct)
This probably has something to do with caching. Try to open the file non-cached (FILE_FLAG_NO_BUFFERING)
EDIT
This is actually documented in the MSDN documentation for ReadFile:
Note If a file or device is opened
for asynchronous I/O, subsequent calls
to functions such as ReadFile using
that handle generally return
immediately, but can also behave
synchronously with respect to blocked
execution. For more information see
http://support.microsoft.com/kb/156932.

Resources