Buffer limitation of windows tcp stack , using winsocket in particular

Buffer limitation of windows tcp stack , using winsocket in particular - windows

While working on a windows application which communicate through winsocket, I've encountred the following scenario:
Alice initiate a tcp session with Bob
Bob accept, and the session is established.
Bob is sending loads of data (~1000 mb) sequancely.
Bob moving on to do other things.
meanwhile, Alice slowly reads the data, N bytes at the time (where N is the size of Alice's buffer, which allocated only once, as the data is written to a file betwin each read. this buffer is allocated by the application).
when debugging this, I found that Bob's send() never blocks, even when I paused Alice before the first read.
question is, what guarantee that the entire data (~1000 mb) will be kept available for Alice to read?
is there known/configurable parameter that limit this buffer's length?

Alice has a socket receive buffer, and Bob has a socket send buffer. Both exist for the lifetime of the respective sockets. Data is removed from Bob's buffer when Alice's TCP acknowledges it, and from Alice's buffer when Alice reads it.

Related

Synchronisation for audio decoders

There's a following setup (it's basically a pair of TWS earbuds and a smartphone):
2 audio sink devices (or buds), both are connected to the same source device. One of these devices is primary (and is responsible for handling connection), other is secondary (and simply sniffs data).
Source device transmits a stream of encoded data and sink device need to decode and play it in sync with each other. There problem is that there's a considerable delay between each receiver (~5 ms # 300 kbps, ~10 ms # 600 kbps and # 900 kbps).
It seems that synchronisation mechanism which is already implemented simply doesn't want to work, so it seems that my only option is to implement another one.
It's possible to send messages between buds (but because this uses the same radio interface as sink-to-source communication, only small amount of bytes at relatively big interval could be transferred, i.e. 48 bytes per 300 ms, maybe few times more, but probably not by much) and to control the decoder library.
I tried the following simple algorithm: secondary will send every 50 milliseconds message to primary containing number of decoded packets. Primary would receive it and update state of decoder accordingly. The decoder on primary only decodes if the difference between number of already decoded frame and received one from peer is from 0 to 100 (every frame is 2.(6) ms) and the cycle continues.
This actually only makes things worse: now latency is about 200 ms or even higher.
Is there something that could be done to my synchronization method or I'd be better using something other? If so, what would be the best in such case? Probably fixing already existing implementation would be the best way, but it seems that it's closed-source, so I cannot modify it.

How can I read the received packets with a NDIS filter driver?

I am currently experimenting with the NDIS driver samples.
I am trying to print the packets contents (including the MAC-addresses, EtherType and the data).
My first guess was to implement this in the function FilterReceiveNetBufferLists. Unfortunately I am not sure how to extract the packets contents out of the NetBufferLists.

That's the right place to start. Consider this code:
void FilterReceiveNetBufferLists(..., NET_BUFFER_LIST *nblChain, ...)
{
UCHAR buffer[14];
UCHAR *header;
for (NET_BUFFER_LIST *nbl = nblChain; nbl; nbl = nbl->Next) {
header = NdisGetDataBuffer(nbl->FirstNetBuffer, sizeof(buffer), buffer, 1, 1);
if (!header)
continue;
DbgPrint("MAC address: %02x-%02x-%02x-%02x-%02x-%02x\n",
header[0], header[1], header[2],
header[3], header[4], header[5]);
}
NdisFIndicateReceiveNetBufferLists(..., nblChain, ...);
}
There are a few points to consider about this code.
The NDIS datapath uses the NET_BUFFER_LIST (nbl) as its primary data structure. An nbl represents a set of packets that all have the same metadata. For the receive path, nobody really knows much about the metadata, so that set always has exactly 1 packet in it. In other words, the nbl is a list... of length 1. For the receive path, you can count on it.
The nbl is a list of one or more NET_BUFFER (nb) structures. An nb represents a single network frame (subject to LSO or RSC). So the nb corresponds most closely to what you think of as a packet. Its metadata is stored on the nbl that contains it.
Within an nb, the actual packet payload is stored as one or more buffers, each represented as an MDL. Mentally, you should pretend the MDLs are just concatenated together. For example, the network headers might be in one MDL, while the rest of the payload might be in another MDL.
Finally, for performance, NDIS gives as many NBLs to your LWF as possible. This means there's a list of one or more NBLs.
Put it all together, and you have:
Your function receives a list of NBLs.
Each NBL contains exactly 1 NB (on the receive path).
Each NB contains a list of MDLs.
Each MDL points to a buffer of payload.
So in our example code above, the for-loop iterates along that first bullet point: the chain of NBLs. Within the loop, we only need to look at nbl->FirstNetBuffer, since we can safely assume there is no other nb besides the first.
It's inconvenient to have to fiddle with all those MDLs directly, so we use the helper routine NdisGetDataBuffer. You tell this guy how many bytes of payload you want to see, and he'll give you a pointer to a contiguous range of payload.
In the good case, your buffer is contained in a single MDL, so NdisGetDataBuffer just gives you a pointer back into that MDL's buffer.
In the slow case, your buffer straddles more than one MDL, so NdisGetDataBuffer carefully copies the relevant bit of payload into a scratch buffer that you provided.
The latter case can be fiddly, if you're trying to inspect more than a few bytes. If you're reading all 1500 bytes of the packet, you can't just allocate 1500 bytes on the stack (kernel stack space is scarce, unlike usermode), so you have to allocate it from the pool. Once you figure that out, note it will slow things down to copy all 1500 bytes of data into a scratch buffer for every packet. Is the slowdown too much? It depends on your needs. If you're only inspecting occasional packets, or if you're deploying the LWF on a low-throughput NIC, it won't matter. If you're trying to get beyond 1Gbps, you shouldn't be memcpying so much data around.
Also note that if you ultimately want to modify the packet, you'll need to be wary of NdisGetDataBuffer. It can give you a copy of the data (stored in your local scratch buffer), so if you modify the payload, those changes won't actually stick to the packet.
What if you do need to scale to high throughputs, or modify the payload? Then you need to work out how to manipulate the MDL chain. That's a bit confusing at first, but spend a little time with the documentation and draw yourself some whiteboard diagrams.
I suggest first starting out by understanding an MDL. From networking's point of view, an MDL is just a fancy way of holding a { char * buffer, size_t length }, along with a link to the next MDL.
Next, consider the NB's DataOffset and DataLength. These conceptually move the buffer boundaries in from the beginning and the end of the buffer. They don't really care about MDL boundaries -- for example, you can reduce the length of the packet payload by decrementing DataLength, and if that means that one or more MDLs are no longer contributing any buffer space to the packet payload, it's no big deal, they're just ignored.
Finally, add on top CurrentMdl and CurrentMdlOffset. These are redundant with everything above, but they exist for (microbenchmark) performance. You aren't required to even think about them if you're reading the NB, but if you are editing the size of the NB, you do need to update them.

C socket - Difference between sending and receiving time

I'm working with two devices, that have their clock correctly synchronized (offset less than 1 ms). I need to send 180KB, using WiFi (estimated bandwidth is about 20Mb/s).
I'm using the C function send (with TCP) on the sender, the recv on the receiver. Since the two clock are synchronized, I expect that the sending time and the receiving time should be the same (without taking into account the propagation time).
However, I obtained that the receiving time is 10ms-15ms higher than the sending time, and considering that the estimated sending/receiving time should be about 60ms, this difference is quite high. I don't think that the problem is due to the processing through the TCP stack on the receiver.
Any idea?

what happens when the recv function in winsock is called and not all the data has been received?

From what i've read around about winsock, recv has a tendency to not receive all the data from a sender in a single call. When people say this do they mean, for example, i send 300 bytes from a client and i call recv on the server, it's possible that it could only receive 200 some bytes on it's first call and the buffer will be filled with those 200 bytes? What happens to the last 100 bytes?
also, let's say a buffer is too small, like 512 bytes or something and the client sends 600. will the first recv call fill the buffer to capacity, and then just drop the last 88 bytes? if i call recv again with a different buffer, will the first 88 bytes of that buffer be the rest of the data?
and thirdly, if one and two are true, and i receive a whole packet of data in separate buffers, will i have to splice them together into one buffer to start parsing my data out of it?

I'm assuming TCP here.
is it possible that it could only receive 200 some bytes on it's first call and the buffer will be filled with those 200 bytes?
Yes.
What happens to the last 100 bytes?
You'll receive them next time.
also, let's say a buffer is too small, like 512 bytes or something and the client sends 600. will the first recv call fill the buffer to capacity, and then just drop the last 88 bytes?
No.
if i call recv again with a different buffer, will the first 88 bytes be the rest of the data?
Yes.
and thirdly, if one and two are true, and i receive a whole packet of data in separate buffers, will i have to splice them together into one buffer to start parsing my data out of it?
That's up to you. Do you need the data in one contiguous buffer? If so, yes.

Why wait SIFS time before sending ACK?

Question about the MAC-protocol of 802.11 Wifi.
We have learned that when a station has received the data it waits for SIFS time. Then it sends the packet. When searching online the reason that is always mentioned is to give ACK packets a higher priority. This is understandable since a station first has to wait DIFS time when it wants to send normal data (and DIFS is larger than SIFS).
But why wait at all? Why not immediately send the ACK? The station knows the data has arrived and the CRC is correct, so why wait?

It is theoretically possible to know that the CRC is correct at the exact end of the received data from the wire, but in practice, you need to accumulate all the samples in the last block in order to run the IFFT, deconvolution, FEC, and then, finally, at the very end, after finally getting the input data out of the waveform, do you know that the CRC is correct. Also, you sometimes need to turn on transmit circuitry to send the ACK, which can hamper receive performance. If each step in the processing chain were instantaneous, and if the transmit circuitry definitely didn't interfere with the receive circuitry, and if there were no lead-time necessary for building the waveform for the ACK, it'd be possible to send the ACK immediately after getting the last bit of the wave-form. But, while each element in this chain takes some deterministic time, it is not instantaneous. SIFS gives the receiver time to get the data from the PHY, verify it, and send the response.
Disclaimer: I'm more familiar with Homeplug than 802.11.

It is like that because Distributed Coordination Function (DCF) and Point Coordination Function (PCF) mode can coexist within one cell. That is a base station may use polling while at the same time the cell can use disitributed coordination using CSMA/CA.
So during SIFS, control frames or next fragment may be sent. During PIFS, PCF frames may be sent and during DIFS DCF frames may be sent. During SIFS and PIFS, PCF can work its magic.
Even though not all base stations support PCF all stations must wait since some may support it.
Update:
The way I understand this now is that during SIFS the station may send RTS,CTS or ACK and have enough time to switch back to receiving mode before the sender starts to transmit. If that's correct, it will send ACK during SIFS. Then it will change to receive mode and wait until SIFS elapses. When SIFS has elapsed the transmitter will start sending.
Also, PCF is controlled by PIFS which comes after SIFS and before DIFS and is therefor not relevant for this discussion (my mistake). That is, SIFS < PIFS < DIFS < EIFS.
Sources: This PDF (page 8) and Computer Networks by Andrew S. Tanenbaum

SIFS = RTT (based on PHY Transmission rate) + FRAME PROCESSING DELAY AT RECEIVER (PHY PROCESSING DELAY + MAC PROCESSING DELAY) + FRAME PROCESSING DELAY (FOR COMPOSING RESPONSE CTS/ACK)+ RF TUNER DELAY (CHANGE FROM RX to TX)
A the Transmitter side, after last PHY Symbol (of RTS, e.g), the time required to change to RX mode (at RF). So, I would see SIFS as a RX side calculation than a TX side.

I can't say for sure but it sounds like an optimization strategy similar to IP. If you don't require an ACK for every data packet, it makes sense to hold off for a bit so that, if more data packets arrive, you can acknowledge them all with a single ACK.
Example: client sends 400 packets really fast to the server. Rather than the server sending back 400 ACKs, it can simply wait until the client takes a breather before sending a single ACK back. Combined with the likelihood that the client will take a breather even under heavy load (it has to as its unacknowledged-packets buffer fills up), this would be workable.
This is possible in systems where the ACK(n) means "I've received everything up to and including packet # n.
You'll get better performance and less traffic by using such a strategy. As long as the wait-before-sending-ack time on the receiver is less than the retransmit-if-no-ack-before time on the sender (taking transmission delays into account), there should be no problem.

Quick crash-course on 802.11:
802.11 is a essentially a giant system of timers. The most common implementations of 802.11 utilize the distributed coordination function, DCF. The DCF allows for nodes to come in and out of the range of a radio channel being used for 802.11 and coordinate in a distributed fashion who should be sending and receiving data (ignoring hidden and exposed node problems for this discussion). Before any node can begin sending data on the channel they all must wait a period of DIFS, in which the channel is determined to be idle, if it is idle during a DIFS period the first node to grab the channel begins transmitting. In standard 802.11, i.e. non-802.11e implementations and non 802.11n, every single data packet that gets transmitted needs to be acknowledged by a physical layer, PHY, acknowledgment packet, irregardless of the upper layer protocol being used. After a data packet gets sent a SIFS time period needs to expire, after SIFS expires control frames destined for the node that has "taken" control of the channel may be sent, in this instance and acknowledgment frame is transmitted. SIFS allows the node that sent the data packet to switch from transmitting to receiving mode. If a packet does get lost and no ACK is received after SIFS/ACK timeout occurs, then exponential back-off is invoked. Exponential back-off, a.k.a contention window (CW), begins at a value CWmin, in some linux implementation this is 15 slot times, where a slot time varies depending on the 802.11 protocol that is being used. The CW value is then chosen from 1 to whatever the upper limit that has been calculated for CW. If the current packet was lost, then the CW is incremented from 15 to 30, and then a random value is chosen between 1 and 30. Every-time there is a consecutive lose the CW doubles up to 1023, at which point it hits a limit. Once a packet is received successfully the CW is reset back to CWmin.
In regards to 802.11n / 802.11e:
Every data packet still needs to be acknowledged, but when using 802.11e (implemented into 802.11n) multiple data packets can be aggregated together in two different ways A-MSDU or A-MPDU. A-MSDU is a jumbo-frame that has one checksum for the entire aggregated packet being sent, within it are many sub-frames that contain each of the data frames that needed to be sent. If there is any error in the A-MSDU frame and it needs to be retransmitted, then every sub-frame is required to be resent. However, when using A-MPDU, each sub-frame has a small header and checksum that allow for any sub-frame that has an error in it to be retransmitted by itself/within another aggregated frame the next time the sending nodes gains the channel. With these aggregated packet sending schemes there is the notion of the block-ack. The block-ack contains a bitmap of the frames from a starting sequence number that were just sent in the aggregated packet and received correctly or incorrectly. Usage of aggregated frame sending greatly improves throughput performance as more data can be sent per channel acquisition by a sending node, also allowing out-of-order packet sending. However, out-order packet sending greatly complicates the 802.11 MAC layer.

SIFS=D+M+Rx/Tx
Where,
D=(Receiver delay (RF delay) and decoding of physical layer convergence procedure (PLCP) preamble/header)
M=(MAC processing delay)
Rx/Tx=(transceiver turnaround time)
Above all the delays can not be avoided so It has to wait SIFS time before sending acknowledgement

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio