Significantly different LAN transmit speed - performance

I have two machines, one Mac and one Linux, in the same local network. I tried to transfer files by using one of them as an HTTP server, it turned out the download speeds were quite different based on which one was the server. If I use Mac as the server, the download speed was around 3MB/s, but in the opposite way, it's about 12MB/s. Then I used iperf3 to test the speed between them and got a similar result:
When Mac was the server and Linux the client:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 28.7 MBytes 2942 KBytes/sec 1905 sender
[ 5] 0.00-10.00 sec 28.4 MBytes 2913 KBytes/sec receiver
When Linux was the server and Mac the client:
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 162 MBytes 16572 KBytes/sec sender
[ 4] 0.00-10.00 sec 161 MBytes 16526 KBytes/sec receiver
I asked a friend to do the download test for me and he told me the speeds were both around 1MB/s on his two Mac, which was far from the router's capacity. How could this happen?

This isn't going to be much of an answer, but it will probably be long enough that it is not going to fit into a comment.
Your observation of "bogus TCP header length" is very interesting; I have never seen it in a capture before, so I wanted to check out exactly what it means. Here you can see that it means that the wireshark TCP protocol dissector can't make any sense of the segment, because the TCP header length is less than the minimum TCP header length.
So it seems you have an invalid TCP segment. Only two causes I know are that it was somehow erroneously constructed (i.e. a bug or intrusion attempt) or that it was corrupted.
I have certainly created plenty of invalid segments when working with raw sockets, and I have seen plenty of forged segments that were not standards conforming, but this doesn't seem likely to be the case in your situation.
So, based on my experience, it seems most likely that it has been somehow corrupted. Although if it was a transmitted packet in the capture, then you are actually sending a invalid segment. So in what follows, I'm assuming it was a received segment.
So where could it have been corrupted? The first mystery is that you are seeing it at all in a capture. If it had been corrupted in the network, the Frame Check Sequence (FCS, a CRC) shouldn't match, and it should have been discarded.
However, it is possible to configure your NIC/Driver to deliver segments with an invalid FCS. On linux you would check/configure these settings with ethtool and the relevant parameters are rx-fcs and rx-all (sorry, I don't know how to do this on a Mac). If those are both "off," your NIC/Driver should not be sending you segments with an invalid FCS and hence they wouldn't appear in a capture.
Since you are seeing the segments with an invalid TCP header length in your capture, and assuming your NIC/Driver is configured to drop segments with an invalid FCS, then your NIC saw a valid segment on the wire, and the segment was either corrupted before the FCS was calculated by a transmitter (usually done in the NIC), or corrupted after the FCS was validated by the receiving NIC.
In both these cases, there is a DMA transfer over a bus (e.g. PCI-e) between CPU memory and the NIC. I'm guessing there is a hardware problem causing corruption here, but I'm not so confident in this guess as I have little information to go on.
You might try getting a capture on both ends to compare what is transmitted to what is received (particularly in the case of segments with invalid TCP header lengths). You can match segments in the captures using the ID field in the IP header (assuming that doesn't get corrupted as well).
Good luck figuring it out!

Related

Synchronisation for audio decoders

There's a following setup (it's basically a pair of TWS earbuds and a smartphone):
2 audio sink devices (or buds), both are connected to the same source device. One of these devices is primary (and is responsible for handling connection), other is secondary (and simply sniffs data).
Source device transmits a stream of encoded data and sink device need to decode and play it in sync with each other. There problem is that there's a considerable delay between each receiver (~5 ms # 300 kbps, ~10 ms # 600 kbps and # 900 kbps).
It seems that synchronisation mechanism which is already implemented simply doesn't want to work, so it seems that my only option is to implement another one.
It's possible to send messages between buds (but because this uses the same radio interface as sink-to-source communication, only small amount of bytes at relatively big interval could be transferred, i.e. 48 bytes per 300 ms, maybe few times more, but probably not by much) and to control the decoder library.
I tried the following simple algorithm: secondary will send every 50 milliseconds message to primary containing number of decoded packets. Primary would receive it and update state of decoder accordingly. The decoder on primary only decodes if the difference between number of already decoded frame and received one from peer is from 0 to 100 (every frame is 2.(6) ms) and the cycle continues.
This actually only makes things worse: now latency is about 200 ms or even higher.
Is there something that could be done to my synchronization method or I'd be better using something other? If so, what would be the best in such case? Probably fixing already existing implementation would be the best way, but it seems that it's closed-source, so I cannot modify it.

On Windows, is WSASendTo() faster than sendto()?

Is WSASendTo() somehow faster than sendto() on Windows?
Is UDP sendto() faster with a non-blocking socket (if there is space in the send buffer)?
Similar to this question :
Faster WinSock sendto()
From my profiling, the send is network bound with blocking socket, i.e. for example with 100 mbit network both send about 38461 datagrams of size 256 bytes/s which is the network speed allowable, I was wondering if anyone has any preference over the 2 speed wise.
sending from localhost to itself on 127.0.0.1 it seems to handle about 250 k send / s which should be about 64 mbyte/s on a 3 ghz pc
it seems 2 times faster blocking, i.e. without FIONBIO set, i.e. with non blocking set it seems to drop to 32 mbyte/s if I retry on EWOULDBLOCK
I don't need to do any heavy duty UDP broadcasting, only wondering the most efficient way if anyone has any deep set "feelings" ?
Also could there be some sort of transmission moderation taking place on network card drivers could there be a maximum datagrams sendable on a gigabit card say would it tolerate for example 100k sends/s or moderate somehow ?

STM32F411 I need to send a lot of data by USB with high speed

I'm using STM32F411 with USB CDC library, and max speed for this library is ~1Mb/s.
I'm creating a project where I have 8 microphones connected into ADC line (this part works fine), I need a 16-bit signal, so I'm increasing accuracy by adding first 16 signals from one line (ADC gives only 12-bits signal). In my project, I need 96k 16-bit samples for one line, so it's 0,768M signals for all 8 lines. This signal needs 12000Kb space, but STM32 have only 128Kb SRAM, so I decided to send about 120 with 100Kb data in one second.
The conclusion is I need ~11,72Mb/s to send this.
The problem is that I'm unable to do that because CDC USB limited me to ~1Mb/s.
Question is how to increase USB speed to 12Mb/s for STM32F4. I need some prompt or library.
Or maybe should I set up "audio device" in CubeMX?
If small b means byte in your question, the answer is: it is not possible as your micro has FS USB which max speeds is 12M bits per second.
If it means bits your 1Mb (bit) speed assumption is wrong. But you will not reach the 12M bit payload transfer.
You may try to write (only if b means bit) your own class but I afraid you will not find a ready made library. You will need also to write the device driver on the host computer

Gianfar Linux Kernel Driver Maximum Receive/Transmit Size

I have been trying to understand the code for the gianfar linux ethernet driver and was having difficulty understanding fragemented pages. I understand the maximum transmission size is 9600 bytes, however does this include fragments ?
Is it possible to send and received transmissions that are larger in size (e.g. 14000 bytes) if they are split among multiple fragements ?
Thank you in advance
9600 is a jumbo frame maximum size. The maximum MTU ("jumbo MTU") size is 9600 - 14 = 9586 bytes. Also, if I recall correctly, MTU never includes 4-byte FCS.
So, 9586 must be simply the maximum Ethernet "payload" size which can be put on wire. It's a limitation with respect to a single Ethernet frame. So, if you have a larger chunk of data ("transmission"), you might be able to "slice" it and produce multiple Ethernet frames from it (to be precise, multiple independent skb-s), each fitting the MTU size. So, in this case you will have multiple independent Ethernet frames to be handed over to the network driver. The interconnection between these frames will only be detectable on the IP header level, i.e., if you peek at IP header of the 1st frame you will be able to see "more fragments" flag indicating that the next frame contains an IP packet which is the next fragment of the original (large) chunk of data. But from the driver's point of view such frames should remain independent.
However, if you mean "skb fragments" rather than "IP fragments", then putting a 14000 byte frame into multiple fragments ("data fragments") of a single skb might not be helpful with respect to the MTU (say, you've configured the jumbo MTU on the interface). Because these fragments are just smaller chunks of contiguous memory containing different parts of the same Ethernet frame. And the driver just makes multiple descriptors pointing to these chunks of memory. The hardware will pick them to send a single frame. And if the HW sees that the overall frame length is bigger than the maximum MTU, it might decline the transmission. Exact behaviour in this case is a topic for a separate talk.

Is Realterm dropping characters or am I?

I'm using a SAML21 board to accept some data over a serial connection, and at the moment, just mirror it to a serial port on a computer. However this data is 6 bytes at ~250Hz(It was closer to 3KHz before). As far as I can tell I'm tracking the start and end bytes correctly, however my columnar alignment gets out of whack on occasion in realterm.
I have it set up for 6 bytes in single mode. SO all columns should be presenting the same bytes up and down. However, over time as I increase the rate at which I mirror(I am still receiving the data at a fixed rate) the first column's byte tends to float.
I have not used realterm at speeds this high before, so I am not aware of it's limitations.

Resources