What are SO_SNDBUF and SO_RCVBUF - windows

Can you explain me what exactly are SO_SNDBUF and SO_RCVBUF options?
OK, for some reason the OS buffers the outgoing/incomming data but I'd like to clarify this subject.
What is their role (generally)?
Are they per-socket buffers?
Is there a connection between Transport layer's buffers (the TCP buffer, for example) and these buffers?
Do they have a different behaviour/role when using stream sockets (TCP) and when using connectionless sockets (UDP)?
A good article will be great too.
I googled it but didn't find any useful information.

The "SO_" prefix is for "socket option", so yes, these are per-socket settings for the per-socket buffers. There are usually system-wide defaults and maximum values.
SO_RCVBUF is simpler to understand: it is the size of the buffer the kernel allocates to hold the data arriving into the given socket during the time between it arrives over the network and when it is read by the program that owns this socket. With TCP, if data arrives and you aren't reading it, the buffer will fill up, and the sender will be told to slow down (using TCP window adjustment mechanism). For UDP, once the buffer is full, new packets will just be discarded.
SO_SNDBUF, I think, only matters for TCP (in UDP, whatever you send goes directly out to the network). For TCP, you could fill the buffer either if the remote side isn't reading (so that remote buffer becomes full, then TCP communicates this fact to your kernel, and your kernel stops sending data, instead accumulating it in the local buffer until it fills up). Or it could fill up if there is a network problem, and the kernel isn't getting acknowledgements for the data it sends. It will then slow down sending data on the network until, eventually, the outgoing buffer fills up. If so, future write() calls to this socket by the application will block (or return EAGAIN if you've set the O_NONBLOCK option).
This all is best described in the Unix Network Programming book.

What is their role (generally)?
Data, that you want to send over a socket, is copied to the send buffer of the socket, so your code doesn't have to wait (=block) until the data has really been sent out to the network. When the send call returns successfully, this only means that the data has been placed into the send buffer from where the protocol implementation will read it as soon as it is ready to send that data over the network.
Keep in mind that multiple sockets from multiple processes may all want to send data at the same time, yet at any time only one data packet can be send over a network line. While sending is in progress, all other senders have to wait and once the line is free, the implementation can only process one send request after another.
Data, that arrives from the network, is written to the receive buffer of the socket by the protocol implementation, where it will wait until your code is reading it from there. Otherwise all receiving would have to stop until your code has processed the incoming packet, yet your code may do other things while a packet arrives in the background and again, the interface is shared, so the system must avoid that other processes cannot receive their network data just because your process is refusing to process its own incoming data.
Are they per-socket buffers?
Yes. Every socket has its own set of buffers.
Is there a connection between Transport layer's buffers (the TCP buffer, for example) and these buffers?
I'm not sure what you mean by "TCP buffers" but if you are referring to the TCP receive and send windows, the answer is yes.
TCP will tell the other side regularly how much room is left in your receive buffer, so that the other side will never send more data than would fit into your receive buffer. If your receive buffer is full, the other side will stop sending completely until there's room again, which will be the case as soon as soon as you read some data from it.
So if you cannot read data as often as would be required to prevent your socket buffer from running full, increasing the receive buffer size can prevent that TCP connections will have to pause sending data.
On the other hand, if the send buffer is running full, the socket will not accept any more data from your code. Any attempt to send will either block or fail with an error (non-blocking socket) until there's room again.
And as TCP can only work with the data currently in the send buffer, the send buffer size also influences the sending behavior of TCP. The TCP sending strategy can depend on various factors. One of them is the amount of data that is known to be sent. If your send buffer is just 2 KB, then TCP will never see more than 2 KB for sending, even though your app may know, that much more data is going to follow. If your send buffer is 256 KB and you put 128 KB of data into it, TCP will know that it has to send 128 KB of data for this connection and this may (and most likely will) influence the sending strategy that TCP uses.
Do they have a different behaviour/role when using stream sockets (TCP) and when using connectionless sockets (UDP)?
Yes. That's because for TCP the data you send is just a stream of bytes. There is no relationship between the bytes and the packets being sent out. Sending 80 bytes could mean sending one packet with 80 bytes or sending 10 packets with each 8 bytes. TCP will decide that on its own. Same for incoming. If there are 200 bytes in your receive buffer, you cannot know how these got there, an amount of bytes you read from a TCP socket may have been transported using any number of packets. So despite transporting data in chunks over packet based networks, a TCP connection behaves like a serial line link.
UDP on the other hand sends datagrams. If you place 80 bytes into the send buffer of an UDP socket, then these 80 bytes are for sure sent out in a single UDP packet containing 80 bytes of payload data. Data is sent in exactly the same way you write it into the send buffer. If you write 80 bytes one after another, 80 packets are sent out, each containing one byte. If you tell a TCP socket to send 200 bytes but there are only 100 bytes room left in the send buffer, TCP will add 100 bytes to the buffer and let you know that 100 of your 200 bytes were added. UDP on the ohter hand will block or fail with an error, as either all 200 bytes fit or nothing fits; there is no partial fit with UDP.
Also when receiving, datagrams are stored in the UDP receive buffer, not bytes. If a TCP socket first receives 80 bytes data and then 200 bytes data, you can perform a read call that reads all 280 bytes at once. If an UDP socket first receives a datagram with 80 bytes and then a datagram with 200 bytes and you request to read 280 bytes from it, you get exactly 80 bytes, as all data returned by a read call must be from the same datagram. You cannot read across datagram borders. Also note that if you request to only read 20 bytes, you receive the first 20 bytes of the datagram and the other 60 bytes are discarded. Next time you read data, it will be from the next datagram whose size was 200 bytes.
So the difference in two sentences: TCP sockets store bytes in the socket buffers, UDP sockes store datagrams in the socket buffers. And datagrams must fit completely in the buffers, incoming datagrams that cannot fit completely into the socket buffer are silently discarded, even though the buffer had some room available.

In Windows, the send buffer does have an effect in UDP. If you blast packets out faster than the network can transmit them, eventually you will fill the socket output buffer and SendTo will fail with "would block". Increasing SO_SNDBUF will help with this. I had to increase both the send and receive buffers for a test I was doing to find the maximum packet rate I could send between a Windows box and a Linux box. I could have also handled the send size by detecting the "would block" error code, sleeping a bit, and retrying. But pumping up the send buffer size was simpler.
The default in Windows is 8K, which seems needlessly small in this era of PC's with GB's of RAM!

Searching Google for "SO_RECVBUF msdn" gave me...
http://msdn.microsoft.com/en-us/library/ms740476(VS.85).aspx
which answers your "are they per socket" with these lines from the options table:
SO_RCVBUF int Specifies the total per-socket buffer space reserved for receives.
SO_SNDBUF int Specifies the total per-socket buffer space reserved for sends.
With more detail later on:
SO_RCVBUF and SO_SNDBUF
When a Windows Sockets implementation supports the SO_RCVBUF and
SO_SNDBUF options, an application can request different buffer sizes
(larger or smaller). The call to setsockopt can succeed even when the
implementation did not provide the whole amount requested. An
application must call getsockopt with the same option to check the
buffer size actually provided.

Above answers didn't answer all questions, especially about the relationship between Socket buffer and TCP buffer.
I think they are different things in different layer. TCP buffer is the consumer of Socket buffer.
Socket buffers (input & output) is an IO buffer that is accessed by System calls from the application code in user space.
For example, with output buffer, the application code can
Send data immediately before the buffer is full and be blocked when buffer is full.
Set the buffer size.
Flush the data in buffer to the underlying storage (TCP send buffer).
Close the output buffer by close the stream.
TCP buffers (send & receive) are in kernel space that only OS can access.
For example, with TCP send buffer, the TCP protocol implementation can
Send packets and accept ACK.
Guarantee delivery and ordering of packets.
Control congestion by resizing the inflight packets window.
By the way, UDP protocol doesn't have buffer but UDP socket can still have IO buffer.
These are my understanding and I'm more than happy to get any feedback/modification/correction.

Related

http2: PUSH_PROMISE reserved stream id validation

The spec says:
The identifier of a newly established stream MUST be numerically
greater than all streams that the initiating endpoint has opened or
reserved. This governs streams that are opened using a HEADERS frame
and streams that are reserved using PUSH_PROMISE. An endpoint that
receives an unexpected stream identifier MUST respond with a
connection error (Section 5.4.1) of type PROTOCOL_ERROR.
For the case of the server that sends PUSH_PROMISE it makes sense to me that conforming servers must send strictly increasing stream ids. But I don't understand how the client is supposed to detect this situation.
For example, on one connection, if the server sends:
PUSH_PROMISE promised stream 2
PUSH_PROMISE promised stream 4
because of concurrency the client might receive
PUSH_PROMISE promised stream 4
PUSH_PROMISE promised stream 2
the spec would have me think that client should error on this, but the server did nothing wrong.
What am I missing here?
If the server wrote PUSH_PROMISE[stream=2] and then PUSH_PROMISE[stream=4], then those frames will be delivered in the same order (this is guaranteed by TCP).
It is a task of a client to read from the socket in an ordered way.
For a HTTP/2 implementation the requirement is even stricter, in that not only it has to read from the socket in an ordered way, but it must also parse the frames in an ordered way.
This is required by the fact that PUSH_PROMISE frame carries a HPACK block and in order to keep the server and client HPACK context in sync, the frames (or at least the HPACK blocks of those frames) must be processed in order, so stream=2 before stream=4.
After that, the client is free to process the 2 frames concurrently.
For implementations, this is actually quite simple to achieve, since a thread allocated to perform I/O reads typically does:
loop
read bytes from socket
if no bytes or socket closed -> break loop
parse read bytes (with HPACK decoding) -> produce frame objects
pass frame objects to upper software layer
end loop
Since the read and parse are sequential and no other thread reads from the same socket, the ordering guarantee is met.

How to send data packets like ACK and ENQ using golang

I am writing an interface to a clinical lab machine, which uses ASTM protocol for communication (http://makecircuits.com/blog/2010-06-25-astm-protocol.html).
To start with, I am trying to use golang's TCP Server to receive data. But not able to figure how to send ACK back to lab machine. I am a newbie in golang. Can any one suggest how I can proceed?
The protocol in the link you supplied is for RS232. When sending data over TCPIP it is a stream (the receiver has to know when the data ends). Normally when changing an RS232 protocol to TCPIP, a header is added to each message which is the length of the message (often two bytes), so if you want to send ASCII ACK you send three bytes a two byte length and one byte data. When writing this you must flush the buffer, as for such a small packet it will not be transmitted until you do.

How to query for number of bytes that transmitted n a socket

I have a server that use TCP sockets, to get number of bytes that sent/received in this socket I can count number of bytes that sent or received using that specific socket but if I use functions like TransmitPackets, then I can't get number of bytes that sent/received through that specific socket. Is there any way that I can get this information from the socket?
You can use the iphlpapi functions to get the active TCP connections statistics. Use the API GetExtendedTcpTable to get informations on all active connections including bytes sent/received, processid & TCP tuple.

Why does Websocket RFC allows control frame interleaved with multiframe

From the Websocket's RFC6455,
it's possible that control frames interleave with fragmented frames.
I don't understand the need for it, as it makes the design more complex for both sending and receiving part.
Currently, control frame can be "Close", "Ping" and "Pong" (everything else is reserved).
If the control frame is "Close", then receiving the end of the fragmentation is useless, so no interleaving would be required (the fragmenting side could just send the "Close" opcode and stop sending any more fragment, since you are not supposed to send anything after a "Close").
If the control frame is "Ping" or "Pong", it does not make any sense. The fragmenting side is sending data to the client, so why would it ask for pinging the client if it's alive (it has this information in the send system call already) ? Or reply to a ping immediately, since it's actually sending data to the client ?
So, why do we need this mechanism (of interleaved control frame) at all ?
It is to detect half open connections: http://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
The other side could be sending you data, but unable to get your data. So being able of interleave pings and pongs, it is possible to check that at least the other end can understand your messages and reply to them.
It does not make it much more complex. You have to read delimited frames anyway, when you find a control frame, take action and continue reading more frames.
http://www.whatwg.org/specs/web-apps/current-work/multipage/network.html#ping-and-pong-frames
.3.4 Ping and Pong frames
The WebSocket protocol specification defines Ping and Pong frames that
can be used for keep-alive, heart-beats, network status probing,
latency instrumentation, and so forth. These are not currently exposed
in the API.
User agents may send ping and unsolicited pong frames as desired, for
example in an attempt to maintain local network NAT mappings, to
detect failed connections, or to display latency metrics to the user.
User agents must not use pings or unsolicited pongs to aid the server;
it is assumed that servers will solicit pongs whenever appropriate for
the server's needs.

Optimally reading data from an Asynchronous Socket

I have a problem with a socket library that uses WSAASyncSelect to put the socket into asynchronous mode. In asynchronous mode the socket is placed into a non-blocking mode (WSAWOULDBLOCK is returned on any operations that would block) and windows messages are posted to a notification window to inform the application when the socket is ready to be read, written to etc.
My problem is this - when receiving a FD_READ event I don't know how many bytes to try and recv. If I pass a buffer thats too small, then winsock will automatically post another FD_READ event telling me theres more data to read. If data is arriving very fast, this can saturate the message queue with FD_READ messages, and as WM_TIMER and WM_PAINT messages are only posted when the message queue is empty this means that an application could stop painting if its receiving a lot of data and useing asynchronous sockets with a too small buffer.
How large to make the buffer then? I tried using ioctlsocket(FIONREAD) to get the number of bytes to read, and make a buffer exactly that large, BUT, KB192599 explicitly warns that that approach is fraught with inefficiency.
How do I pick a buffer size thats big enough, but not crazy big?
As far as I could ever work out, the value set using setsockopt with the SO_RVCBUF option is an upper bound on the FIONREAD value. So rather than call ioctlsocket it should be OK to call getsockopt to find out the SO_RCVBUF setting, and use that as the (attempted) value for each recv.
Based on your comment to Aviad P.'s answer, it sounds like this would solve your problem.
(Disclaimer: I have always used FIONREAD myself. But after reading the linked-to KB article I will probably be changing...)
You can set your buffer to be as big as you can without impacting performance, relying on the TCP PUSH flag to make your reads return before filling the buffer if the sender sent a smaller message.
The TCP PUSH flag is set at a logical message boundary (normally after a send operation, unless explicitly set to false). When the receiving end sees the PUSH flag on a TCP packet, it returns any blocking reads (or asynchronous reads, doesn't matter) with whatever's accumulated in the receive buffer up to the PUSH point.
So if your sender is sending reasonable sized messages, you're ok, if he's not, then you limit your buffer size such that even if you read into it all, you don't negatively impact performance (subjective).

Resources