Ethernet checksum checking in wireless stack - linux-kernel

When receiving a raw ethernet packet over a wireless connection, where does the ethernet checksum get calculated, and where are errors handled?
Does the wireless stack handle this, or is it handled in the upper layers?

Checksums may be carried out in various places. Recent Ethernet cards offload the checksums from the network stack. I have had to disable hardware checksumming to make network forensics easier. This should make obvious sense as without this functionality hardware would always silently drop packets.

Usually, the Ethernet level FCS (Frame Check Sequence) is handled in the hardware MAC (Media Access Controller). Note that we are talking about a CRC here and not just a checksum (there isn't a "checksum" at the Ethernet frame level).
If an FCS mismatch is detected, it will most probably be discarded at the HW MAC level: a statistics counter will then be updated.
In other words, it is no use "bothering" the software stack with an unusable frame.

As the other posters have said the FCS is normally checked by the NIC
itself or by the driver. However, in the case where you read up raw
ethernet frames I think it depends completely on the driver. For
instance, in WiFi NICs that can be set in "monitor" or "promiscous"
modes you usually don't want them to discard frames with bad FCS since
that may signify an error that you are looking for.
One data point: the Intel 4965AGN Linux driver sets the FCS field in all
captured packets to 0 in monitor mode. If you run Wireshark you can
see that it calculates the expected FCS and complains that the 0-field
is invalid. Wether this means that it discards frames with bad FCS
in the MAC, or if those are also passed up is unfortunately unclear.
So if the original question is "Do I have to check the FCS myself
when capturing raw packets" the answer in the 4965AGN case is
"you can't", and may be "yes" if you get the real FCS from the
NIC.

Most network hardware will allow you to set an option in the hardware to "store bad packets." This allows you to see packets in which the ethernet CRC failed. If you pass a bad ethernet frame to the stack, it will most likely be rejected due to a bad upper layer checksum. The stack does not check ethernet CRCs; this is left to the NIC, and CRC computation in software is time-consuming.
Keep in mind that stacked network protocols usually calculate checksums at various points in the stack. TCP will typically calculate a CRC at the network layer, IP header checksum at the IP layer, and TCP checksum at the TCP layer. The application may also verify the integrity of the data.

Related

How to set the UDP packet reassembly timeout in Windows 10

I am currently developing an image aquisition application in Visual C++ that receives image data from an UDP hardware device with limited capabilities (i.e. no UDP checksum). The device has a GBit connection to a dedicated switch and the PC uses a dedicated NIC and a 10GBit connection to this switch.
The transmitted image data consists of packets with a size ranging from 6528 to 19680 bytes. These packets are fragmented by the hardware device and reconstructed by the network stack on the PC.
Sometimes a packet (call it packet #4711) is lost and the PC side tries to reconstruct it for a long time. Within this timespan a new packet with the same packed id is sent by the hardware device because of an overflow of the 16-bit packet id. Now the PC receives new fragments for (a new) packet #4711 and uses it to complete the old, still unassembled packet and assembles a damaged packet. To top it, the remaining fragments of the new #4711 packet are stored and combined with the next #4711 (which will be received a few seconds later). So the longer the system runs, the more packet ids will be compromised until no communication is possible at all.
We cannot calculate the UDP checksum on the hardware device because of it's limited capabilities.
We cannot use IPv6 (which would offer bigger packet ids) because there is no support for the hardware device.
We will have to implement our own protocol on top of UDP and "manually" fragment and reconstruct the data, but we could avoid this if we could find a way to reduce the packet reconstruction timeout on Windows to 500ms or less.
I searched Google and Stackoverflow for information, but there are not many results and none of them was of much help.
Hence the question: Is there a way to reduce the reconstruction timeout for IPv4 UDP fragments on Windows 10 via Registry, Windows API or any other magic or do you have a better suggestion?
Since Windows 2000 its hardcoded there is no official way of modifying the ip packet reassembly timeout because of the strict RFC 2460 compatibility.
Details can be read here:
https://blogs.technet.microsoft.com/nettracer/2010/06/03/why-doesnt-ipreassemblytimeout-registry-key-take-effect-on-windows-2000-or-later-systems/
Currently the only possibility seems to use raw-sockets which are limited since Windows 7 and not available with every socket provider. This would make the application much more complex.
We will alter our software protocol so that no packets > 1400 Byte are being send at all. This forces us to care about fragmentation in our software but prevents IP packet fragmentation and all of its traps. Perhaps this is the correct way to handle such problems.

How do tools like iperf measure UDP?

Given that UDP packets don't actually send acks, how does a program like iperf measure their one-way performance, i.e., how can it confirm that the packets actually reached:
within a time frame
intact, and uncorrupted
To contrast, Intuitively, to me, it seems that TCP packets, which have an ack signal sent back to allow rigorous benchmarking of their movement across a network can be done very reliably from a client.
1/ "how can it confirm that the packets actually reached [...] intact, and uncorrupted"
UDP is an unfairly despised protocol, but come on, this is going way too far here! :-)
UDP have checksum, just like TCP:
https://en.wikipedia.org/wiki/User_Datagram_Protocol#Checksum_computation
2/ "how can it confirm that the packets actually reached [...] within a time frame"
It does not, because this is not what UDP is about, nor TCP by the way.[*]
As can be seen from its source code here:
https://github.com/esnet/iperf/blob/master/src/iperf_udp.c#L55
...what it does though, is check for out of order packets. A "pcount" is set in the sending side, and checked at the receiving side here:
https://github.com/esnet/iperf/blob/master/src/iperf_udp.c#L99
...and somewhat calculate a bogus jitter:
https://github.com/esnet/iperf/blob/master/src/iperf_udp.c#L110
(real life is more complicated than this, you not only have jitter, but also drift)
[*]:
For semi-garanteed, soft "within a time frame" / real time layer 3 and above protocols, look at RTP, RTSP and such. But neither TCP nor UDP inherently have this.
For real, serious hard real-time garantee, you've got to go to layer 2 protocols such as Ethernet-AVB:
https://en.wikipedia.org/wiki/Audio_Video_Bridging
...which were designed because IP and above simply cannot. make. hard. real. time. guaranteed. delivery. Period.
EDIT:
This is another debate, but...
The first thing you need for "within a time frame", is a shared wall clock on sending/receiving systems (else, how could you tell that such received packet is out of date?)
From Layer 3 (IP) and above, NTP precision target is about 1ms. It can be less than that on a LAN (but accross IP networks, it's just taking a chance and hope the best).
On layer 2, aka "LAN" the layer 2 PTP (Precision Time Protocol) IEEE 1588 is for sub-microsecond range. That's a 1000 times more accurate. Same goes for the derived IEEE 802.1AS, "Timing and Synchronization for Time-Sensitive Applications (gPTP)" used In Ethernet AVB.
Conclusion on this sub-topic:
TCP/IP, though very handy and powerful, is not designed to "guarantee delivery within a time frame". Be it TCP or UDP. Get this idea out of your head.
The obvious way would be to connect to a server that participates in the testing.
The client starts by (for example) connecting to an NTP server to get an accurate time base.
Then the UDP client sends a series of packets to the server. In its payload, each packet contains:
a serial number
a timestamp when it was sent
a CRC
The server then looks these over and notes whether any serial numbers are missing (after some reasonable timeout) and compares the time it received each packet to the time the client sent the packet. After some period of time, the server sends a reply indicating how many packets it received, the mean and standard deviation of the transmission times, and an indication of how many appeared to be corrupted based on the CRCs they contained.
Depending on taste, you might also want to set up a simultaneous TCP connection from the client to the server to coordinate testing of the UDP channel and (possibly) return results.

Can the Rx/Tx Packet Buffer size be changed dynamically on a Linux NIC driver?

At the moment, the transmit and receive packet size is defined by a macro
#define PKT_BUF_SZ (VLAN_ETH_FRAME_LEN + NET_IP_ALIGN + 4)
So PKT_BUF_SZ comes to around 1524 bytes. So the NIC I am having can handle incoming packets from the network which are <= 1524. Anything bigger than that causes the system to crash or worse reboot. Using Linux kernel 2.6.32 and RHEL 6.0, and a custom FPGA NIC.
Is there a way to change the PKY_BUF_SZ dynamically by getting the size of the incoming packet from the NIC? Will it add to the overhead? Should the hardware drop the packets before it reaches the driver ?
Any help/suggestion will be much appreciated.
This isn't something that can be answered without knowledge of the specific controller. They all work differently in details.
Some broadcom NICs for example have different-sized pools of buffers from which the controller will select an appropriate one based on the frame size. For example, a pool of small (256) byte buffers, a pool of standard size (1536 or so) buffers, and a pool of jumbo buffers.
Some intel NICs have allowed a list of fixed size buffers together with a maximum frame size and it will then pull as many consecutive buffers as needed (not sure linux ever supported this use though -- it's much more complicated for software to handle).
But the most common model that most NICs use (and in fact, I believe all of the commercial ones can be used this way): they expect an entire frame to fit in a single buffer, and your single buffer size needs to accommodate the largest frame you will receive.
Given that your NIC is a custom FPGA one, only its designers can advise you on the specifics you're asking. If linux is crashing when larger packets come through, then most likely either your allocated buffer size is not as large as you are telling the NIC it is (leading to overflow), or the NIC has a bug that is causing it to write into some other memory area.

How to find out whether the packet transferred over UDP is lost or dropped?

I am new to networking. I have a small doubt.
I am sending an alarm using SNMP to a target, but the alarm is not received at the target within the specified amount of time. I feel that the data may be lost or dropped.
Now my question is : on what basis should I conclude that there is a loss or drop?
Or will there be any other reason for the trap not to be received?
If I assume your definition of "lost" means one of the network equipment (switch, firewall, ...) didn't forward it to the next hop, and "dropped" means your network board didn't deliver it to your application (e.g. input buffer full, ...).
Under those assumptions, you have no way to know, in your application, that the packet has been "lost" or "dropped". If you want to be sure, you can install network sniffer such as Wireshark on your computer to make sure your packet is delivered (but maybe not processed by your application), or configure your network appliance (if you can) to log packets dropping (meaning "loss" accross the network).

How does the Linux kernel manage data that has been passed to a user program via DMA?

I was reading that in some network drivers it is possible via DMA to pass packets directly into user memory. In that case, how would it be possible for the kernel's TCP/IP stack to process the packets?
The short answer is that it doesn't. Data isn't going to be processed in more than one location at once, so if networking packets are passed directly to a user space program, then the kernel isn't going to do anything else with them; it has been bypassed. It will be up to the user space program to handle it.
An example of this was presented in a device drivers class I took a while back: High-Frequency stock trading. There is an article about one such implementation at Forbes.com. The idea is that traders want their information as fast as possible, so they use specially crafted packets that when received (by equally specialized hardware), they are presented directly to the traders program, bypassing the relatively high-latency TCP/IP stack in the kernel. Here's an excerpt from the linked article talking about two such special network cards:
Both of these cards provide kernel bypass drivers that allow you to send/receive data via TCP and UDP in userspace. Context switching is an expensive (high-latency) operation that is to be avoided, so you will want all critical processing to happen in user space (or kernel-space if so inclined).
This technique can be used for just about any application where the latency between user programs and the hardware needs to be minimized, but as your question implies, it means that the kernel's normal mechanisms for handling such transactions are going to be bypassed.
Networking chip can have register entries that can filter out per IP/UDP/TCP + port and routes those packets to via special set DMA descriptors. If you pre-allocate the DMA able memory via driver and MMAP that memory to user space, one can easily route a particular stream of traffic to user space completely without any kernel code touching it.
I used to work on a video platform. The networking ingress is done by FPGA. Once configured, it can route 10 gbits of UDP packets into the system and automatically route certain MPEG PS PID matched packets out to CPU. It can filter some other video/audio packets into the other part of system at 10gbits wire speed in a very low end FPGA.

Resources