netfilter speed limit - linux-kernel

I am testing netlink filter application on 1Gbit/sec network: i have user space function sending verdict to netlink socket; another user space routine performs async read of marked packets from netlink socket and some custom filter function. For the bitrates >300 Mbps i see netlink socket read errors "no buffer space available". I take it as netlink buffer overflow.
Can someone recommend an approach on how to improve netlink throughput for high speed network? My kernel version is 2.6.38.

There is socket between kernel to user space. via the socket packet upload to user space. the socket buffer is full , so you get an error.
in c you can define the socket buffer size and increase it (this is done by netlink)

Related

What does to_ms = -1 do on winpcap and linux libpcap?

I am trying to port a libpcap based program to macos, and it seems to be written for windows and linux. In the pcap_open_live function, the read timeout was set to -1, same with PacketOpen, and on macOS, this causes an error when trying to open the interface, BIOCSRTIMEOUT: Invalid Argument. I am unable to find any documentation on what a -1 read timeout actually does. Additonally, is there a version of this which will allow me to do the same thing on a BPF based libpcap?
What does to_ms = -1 do on winpcap and linux libpcap?
Nothing predictable. To quote the tip-of-the-master-branch pcap(3pcap) man page:
packet buffer timeout
If, when capturing, packets are delivered as soon as they
arrive, the application capturing the packets will be woken up
for each packet as it arrives, and might have to make one or
more calls to the operating system to fetch each packet.
If, instead, packets are not delivered as soon as they arrive,
but are delivered after a short delay (called a "packet buffer
timeout"), more than one packet can be accumulated before the
packets are delivered, so that a single wakeup would be done for
multiple packets, and each set of calls made to the operating
system would supply multiple packets, rather than a single
packet. This reduces the per‐packet CPU overhead if packets are
arriving at a high rate, increasing the number of packets per
second that can be captured.
The packet buffer timeout is required so that an application
won’t wait for the operating system’s capture buffer to fill up
before packets are delivered; if packets are arriving slowly,
that wait could take an arbitrarily long period of time.
Not all platforms support a packet buffer timeout; on platforms
that don’t, the packet buffer timeout is ignored. A zero value
for the timeout, on platforms that support a packet buffer time‐
out, will cause a read to wait forever to allow enough packets
to arrive, with no timeout.
NOTE: the packet buffer timeout cannot be used to cause calls
that read packets to return within a limited period of time,
because, on some platforms, the packet buffer timeout isn’t sup‐
ported, and, on other platforms, the timer doesn’t start until
at least one packet arrives. This means that the packet buffer
timeout should NOT be used, for example, in an interactive
application to allow the packet capture loop to ‘‘poll’’ for
user input periodically, as there’s no guarantee that a call
reading packets will return after the timeout expires even if no
packets have arrived.
Nothing is said there about a negative timeout; I'll update it to explicitly say that a negative value should not be used. (Not on Windows, not on macOS, not on Linux, not on *BSD, not on Solaris, not on AIX, not on HP-UX, not on Tru64 UNIX, not on IRIX, not on anything.)
By setting the timeout to -1, they probably intended to put the pcap_t into "non-blocking mode", where an attempt to read will return immediately if there are no packets waiting to be read, rather than waiting for a packet to arrive. So, instead, provide a timeout of, for example, 100 (meaning 1/10 second) and use pcap_setnonblock() after the pcap_open_live() call to put the pcap_t into non-blocking mode. That should work on all platforms.

TCP retransmission for the same packet but with the different TCP payload

I'm developing an Ethernet driver in Linux platform. I found that when a TCP retransmission occurred, the TCP payloads of multiple retransmission packets referring to the same sequence number packets were different. I can't understand why it would happen. In my driver, I just allocated a normal network device without any specific flags. By the way, the TCP checksum field was also wrong in these retransmission packets, however, the checksum in all the other types of TCP packets was right, such as SYNC, ACK, and DUP ACK.
I captured the packets by wireshark, and it means the packets I captured were not handled by my driver, just from the TCP stack in Linux kernel. But when I tested with other Ethernet devices and drivers, this problem didn't happen. So my questions were like the following.
Is there any possible for TCP stack to retransmit the same packets without same payload?
Which kinds of parameters in Linux kernel would cause these problem?
How can my driver cause this problem?
Thx for all your reply.
I found the reason for this problem. It's a very very fool fault.
Due to making all the DMA buffers (in my driver it's skb->data)address aligned to 4bytes, I called a memmove function to do that. Actually, the data referred by the skb->data is shared by all the TCP/IP stack in the Kernel. So after this wrong operation, when the TCP retranmission occurs, the address referred by skb->data in TCP stack still maintains the original one. That's why the checksum based on original data seems wrong in wireshark. The codes in my driver is like the following.
u32 skb_len = skb->len;
u32 align = check_aligned(skb);
if !align
return skb;
skb_push(skb,align);
memmove(skb->data, skb->data+align, skb_len);
skb_trim(skb, skb_len);
return skb;
I hope my experience can help others to avoid this fool fault.

Change max UDP Packet Size

It seems that I'm not able to receive UDP Packets with a message bigger than 4096 bytes.
Where can I change this limit?
Is it OS or network adapter related?
I got this issue on my Windows Server 2012 R2 while it's working fine on my Windows 8.1 pc.
Any hint would be much appreciated.
You need to raise the socket send buffer size at the sender, and the socket receive buffer size at the receiver. However the generally accepted practical limit on UDP payload size is 534 bytes. Above that, they can be fragmented, and if a fragment doesn't arrive the entire datagram is lost.
According to the Microsoft documentation for socket options, there's a SO_MAX_MSG_SIZE option that is "the maximum outbound message size for message-oriented sockets supported by the protocol." UDP sockets are "message-oriented sockets" (as opposed to "stream-oriented sockets"; TCP sockets are stream-oriented).
This suggests that there is a maximum message size imposed by the operating system. Sadly, that page does not say "yes" in the "Set" column for the SO_MAX_MSG_SIZE row, so your program can't override that maximum.

USB2.0 Transfer using usb_submit_urb gives kernel panic

Scenario
I am building and transferring ethernet packets from application over USB2.0.
Inside the USB class driver, I am issuing a request to send this packets to BULK endpoint using usb_submit_urb. My ethernet packet size is 112 bytes. I am able to transfer 8000 packets in ~ 200 msec without considering presentation time.
I am making an ioctl call to send packets at a very faster rate say I am making an ioctl call after every 3-4 usec. Inside my ioctl I am issuing usb_submit_urb which is non-blocking call unlike usb_bulk_msg.
Problem
If I consider presentation time, the kernel panics and dmesg log reads kernel panic - not syncing : Fatal exception in interrupt. For INFO: By considering presentation time, packets will wait in hardware device till timestamp_in_packet == hardware time.
I need the learning of how EHCI behaves in such conditions or what can be the status of endpoint queues in such a scenario. I am working on a ETH over USB chip. What is the actual cause of such kernel panics.
Any inputs will help alot.

Why would a device driver cause page faults?

I have a Windows console application that uses a parallel IO card for high speed data transmission. (General Standards HPDI32ALT)
My process is running in user mode, however, I am sure somewhere behind the device's API there is some kernel mode driver activity (PCI DMA transfers, reading device status registers etc..) The working model is roughly this:
at startup: I request a pointer to an IO buffer from API.
in my main loop:
block on API waiting for room in device's buffer (low watermark)
fill the IO buffer with transmission data
begin transmission to device by passing it the pointer to the IO buffer (during this time the API uses DMA on PCI bus to move the data to the card)
block on API waiting for IO to complete
The application appears to be working correctly with proper data rate and sustained throughput for long periods of time, however, when I look at the process in sys internals tool process explorer I see a large number of page faults (~6k per second). I am moving ~30MB/s to the card.
I have plenty of RAM and am reasonably sure the page faults are not disk IO related.
Any thoughts on what could be causing the page faults? I also have a receive side to this application that is using an identical IO card in receive mode. The receive mode use of the API does not cause a large number page faults.
Could the act of moving the IO buffer to kernel mode cause page faults?
So your application asks the driver for a memory buffer and you copy the send data into that buffer? That's a pretty strange model, usually you let the application manage the buffers.
If you're faulting 6K pages/s and you're only transfering 30MB/s, you're almost getting a page fault for every page you transfer. When you get the data buffer from the driver, is it always zero filled? I'm wondering if you're getting demand zero faults for every transfer.
-scott

Resources