According to RFC7540:
An HTTP request/response exchange fully consumes a single stream. A request starts with the HEADERS frame that puts the stream into an "open" state. The request ends with a frame bearing END_STREAM, which causes the stream to become "half-closed (local)" for the client and "half-closed (remote)" for the server. A response starts with a HEADERS frame and ends with a frame bearing END_STREAM, which places the stream in the "closed" state.
Knowing that a stream cannot be reopened once it's closed, this means that if I want to implement a long-lived connection where the client sends a stream of requests to the server, I will have to use a new stream for each request. But there is a finite number of streams available, so in theory, I could run out of streams and have to restart the connection.
Why did the writers of the specification design a request/response exchange to completely consume a stream? Wouldn't it have been easy to make a stream like a single thread of exchanges, where you can have multiple exchanges done in serial in one stream?
The point of having many streams multiplexed in a single connection is to interleave them, so that if one cannot proceed, others can.
Reusing a stream for more than one request means just reusing its stream id. I don't see much benefit reusing 4-byte integers -- on the contrary the implementation would become quite more complicated.
For example, the server can inform the client of the last stream that it processed when it's about to close a connection. If stream ids are reused, it would not be possible to report this reliably.
Also, imagine the case where the client sends requestA on stream5; this arrives on the server where its processing takes time; the client times out, sends a RST_STREAM for stream5 (to cancel requestA) and then requestB on stream5. While these are in-flight, the server finishes the processing of requestA and sends the response for requestA on stream5. Now the client reads a response, but it does not know if it is that of requestA or that of requestB.
But there is a finite number of streams available, so in theory, I could run out of streams and have to restart the connection.
That is correct. At 1 ms per exchange, it will take about 12 days to consume the stream ids for a single connection ((2^31-1)/1000/3600/24/2=12.4 days) -- remember that stream ids are incremented by 2 (clients only send odd stream ids).
While this is possible, I have never encountered this case in all the HTTP/2 deployments that I have seen -- typically the connection goes idle and gets closed well before consuming all stream ids.
The specification preferred simplicity and stable features over the ability to reuse stream ids.
Also, bear in mind that HTTP/2 was designed mostly with the web in mind, where browsers make a number of requests to download a web page and its resources, but then stay idle for a while.
The case where an HTTP/2 connection is bombed with non-stop requests is definitely possible, but much rarer and as such it has not probably been deemed important enough in the design -- using 8 bytes for stream ids seems overkill and a cost that is paid for each request even if the 4 bytes limit is never, practically, reached.
Related
Hi guys I was just wondering if the websocket protocol already handles the sending of large data in chunks. At least knowing that it does will save me the time of doing so myself.
According to RFC-6455 base framing, has a maximum size limit of 2^63 bytes which means it actually depends on your client library implementation.
I was just wondering if the websocket protocol already handles the sending of large data in chunks...
Depends what you mean by that.
The WebSockets protocol is frame based (not stream based)
If what you're wondering about is "will a huge payload arrive in one piece?" - the answer is always "yes".
The WebSockets protocol is a frame / message based protocol - not a streaming protocol. Which means that the protocols wraps and unwraps messages in a way that's designed to grantee message ordering and integrity. A messages will not get...
...truncated in the middle (unlike TCP/IP, which is a streaming based protocol, where ordering is preserved, but not message boundaries).
The WebSockets protocol MAY use fragmented "packets"
According to the standard, the protocol may break large messages to smaller chunks. It doesn't have too.
There's a 32 bit compatibility concern that makes some clients / servers fragment messages into smaller fragments and later put them back together on the receiving end (before the onmessage callback is called).
Application layer "chunking" is required for multiplexing
Sending large payloads over a single WebSocket connection will cause a pipelining issue, where other messages will have to wait until the huge payload is sent, received and (if required) re-assembled.
In practice, this means that large payloads should be fragmented by the application layer. This "chunked" application layer approach will enable multiplexing the single WebSocket connection.
The spec says:
The identifier of a newly established stream MUST be numerically
greater than all streams that the initiating endpoint has opened or
reserved. This governs streams that are opened using a HEADERS frame
and streams that are reserved using PUSH_PROMISE. An endpoint that
receives an unexpected stream identifier MUST respond with a
connection error (Section 5.4.1) of type PROTOCOL_ERROR.
For the case of the server that sends PUSH_PROMISE it makes sense to me that conforming servers must send strictly increasing stream ids. But I don't understand how the client is supposed to detect this situation.
For example, on one connection, if the server sends:
PUSH_PROMISE promised stream 2
PUSH_PROMISE promised stream 4
because of concurrency the client might receive
PUSH_PROMISE promised stream 4
PUSH_PROMISE promised stream 2
the spec would have me think that client should error on this, but the server did nothing wrong.
What am I missing here?
If the server wrote PUSH_PROMISE[stream=2] and then PUSH_PROMISE[stream=4], then those frames will be delivered in the same order (this is guaranteed by TCP).
It is a task of a client to read from the socket in an ordered way.
For a HTTP/2 implementation the requirement is even stricter, in that not only it has to read from the socket in an ordered way, but it must also parse the frames in an ordered way.
This is required by the fact that PUSH_PROMISE frame carries a HPACK block and in order to keep the server and client HPACK context in sync, the frames (or at least the HPACK blocks of those frames) must be processed in order, so stream=2 before stream=4.
After that, the client is free to process the 2 frames concurrently.
For implementations, this is actually quite simple to achieve, since a thread allocated to perform I/O reads typically does:
loop
read bytes from socket
if no bytes or socket closed -> break loop
parse read bytes (with HPACK decoding) -> produce frame objects
pass frame objects to upper software layer
end loop
Since the read and parse are sequential and no other thread reads from the same socket, the ordering guarantee is met.
in RFC 7540 section 5.1.1. (https://www.rfc-editor.org/rfc/rfc7540#section-5.1.1), it specifies as following:
The identifier of a newly established stream MUST be numerically greater than all streams that the initiating endpoint has opened or reserved.
I searched a lot on Google, but still no one explained why the stream ID must be in an ascending order. I don't see any benefit from making this rule to the protocol. From my point of view, out of order stream IDs should also work well if the server just consider the "stream ID" as an ID and use it to distinguish HTTP2 request.
So could anyone can help out explaining the exact reason for this specification?
Thanks a lot!
Strictly ascending stream IDs are an easy way to make them unique (per connection), and it's super-easy to implement.
Choosing - like you say - "out of order" stream IDs is potentially more complicated, as it requires to avoid clashes, and potentially consumes more resources, as you have to remember all the stream IDs that are in use.
I don't think there is any particular reason to specify that stream IDs must be ascending apart simplicity.
6.8. GOAWAY
The GOAWAY frame (type=0x7) is used to initiate shutdown of a
connection or to signal serious error conditions. GOAWAY allows an
endpoint to gracefully stop accepting new streams while still
finishing processing of previously established streams. This enables
administrative actions, like server maintenance.
There is an inherent race condition between an endpoint starting new
streams and the remote sending a GOAWAY frame. To deal with this
case, the GOAWAY contains the stream identifier of the last peer-
initiated stream that was or might be processed on the sending
endpoint in this connection. For instance, if the server sends a
GOAWAY frame, the identified stream is the highest-numbered stream
initiated by the client.
Once sent, the sender will ignore frames sent on streams initiated by
the receiver if the stream has an identifier higher than the included
last stream identifier. Receivers of a GOAWAY frame MUST NOT open
additional streams on the connection, although a new connection can
be established for new streams.
If the receiver of the GOAWAY has sent data on streams with a higher
stream identifier than what is indicated in the GOAWAY frame, those
streams are not or will not be processed. The receiver of the GOAWAY
frame can treat the streams as though they had never been created at
all, thereby allowing those streams to be retried later on a new
connection.
From the Websocket's RFC6455,
it's possible that control frames interleave with fragmented frames.
I don't understand the need for it, as it makes the design more complex for both sending and receiving part.
Currently, control frame can be "Close", "Ping" and "Pong" (everything else is reserved).
If the control frame is "Close", then receiving the end of the fragmentation is useless, so no interleaving would be required (the fragmenting side could just send the "Close" opcode and stop sending any more fragment, since you are not supposed to send anything after a "Close").
If the control frame is "Ping" or "Pong", it does not make any sense. The fragmenting side is sending data to the client, so why would it ask for pinging the client if it's alive (it has this information in the send system call already) ? Or reply to a ping immediately, since it's actually sending data to the client ?
So, why do we need this mechanism (of interleaved control frame) at all ?
It is to detect half open connections: http://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
The other side could be sending you data, but unable to get your data. So being able of interleave pings and pongs, it is possible to check that at least the other end can understand your messages and reply to them.
It does not make it much more complex. You have to read delimited frames anyway, when you find a control frame, take action and continue reading more frames.
http://www.whatwg.org/specs/web-apps/current-work/multipage/network.html#ping-and-pong-frames
.3.4 Ping and Pong frames
The WebSocket protocol specification defines Ping and Pong frames that
can be used for keep-alive, heart-beats, network status probing,
latency instrumentation, and so forth. These are not currently exposed
in the API.
User agents may send ping and unsolicited pong frames as desired, for
example in an attempt to maintain local network NAT mappings, to
detect failed connections, or to display latency metrics to the user.
User agents must not use pings or unsolicited pongs to aid the server;
it is assumed that servers will solicit pongs whenever appropriate for
the server's needs.
I have created a client/server program, the client starts
an instance of Writer class and the server starts an instance of
Reader class. Writer will then write a DATA_SIZE bytes of data
asynchronously to the Reader every USLEEP mili seconds.
Every successive async_write request by the Writer is done
only if the "on write" handler from the previous request had
been called.
The problem is, If the Writer (client) is writing more data into the
socket than the Reader (server) is capable of receiving this seems
to be the behaviour:
Writer will start writing into (I think) system buffer and even
though the data had not yet been received by the Reader it will be
calling the "on write" handler without an error.
When the buffer is full, boost::asio won't fire the "on write"
handler anymore, untill the buffer gets smaller.
In the meanwhile, the Reader is still receiving small chunks
of data.
The fact that the Reader keeps receiving bytes after I close
the Writer program seems to prove this theory correct.
What I need to achieve is to prevent this buffering because the
data need to be "real time" (as much as possible).
I'm guessing I need to use some combination of the socket options that
asio offers, like the no_delay or send_buffer_size, but I'm just guessing
here as I haven't had success experimenting with these.
I think that the first solution that one can think of is to use
UDP instead of TCP. This will be the case as I'll need to switch to
UDP for other reasons as well in the near future, but I would
first like to find out how to do it with TCP just for the sake
of having it straight in my head in case I'll have a similar
problem some other day in the future.
NOTE1: Before I started experimenting with asynchronous operations in asio library I had implemented this same scenario using threads, locks and asio::sockets and did not experience such buffering at that time. I had to switch to the asynchronous API because asio does not seem to allow timed interruptions of synchronous calls.
NOTE2: Here is a working example that demonstrates the problem: http://pastie.org/3122025
EDIT: I've done one more test, in my NOTE1 I mentioned that when I was using asio::iosockets I did not experience this buffering. So I wanted to be sure and created this test: http://pastie.org/3125452 It turns out that the buffering is there event with asio::iosockets, so there must have been something else that caused it to go smoothly, possibly lower FPS.
TCP/IP is definitely geared for maximizing throughput as intention of most network applications is to transfer data between hosts. In such scenarios it is expected that a transfer of N bytes will take T seconds and clearly it doesn't matter if receiver is a little slow to process data. In fact, as you noticed TCP/IP protocol implements the sliding window which allows the sender to buffer some data so that it is always ready to be sent but leaves the ultimate throttling control up to the receiver. Receiver can go full speed, pace itself or even pause transmission.
If you don't need throughput and instead want to guarantee that the data your sender is transmitting is as close to real time as possible, then what you need is to make sure the sender doesn't write the next packet until he receives an acknowledgement from the receiver that it has processed the previous data packet. So instead of blindly sending packet after packet until you are blocked, define a message structure for control messages to be sent back from the receiver back to the sender.
Obviously with this approach, your trade off is that each sent packet is closer to real-time of the sender but you are limiting how much data you can transfer while slightly increasing total bandwidth used by your protocol (i.e. additional control messages). Also keep in mind that "close to real-time" is relative because you will still face delays in the network as well as ability of the receiver to process data. So you might also take a look at the design constraints of your specific application to determine how "close" do you really need to be.
If you need to be very close, but at the same time you don't care if packets are lost because old packet data is superseded by new data, then UDP/IP might be a better alternative. However, a) if you have reliable deliver requirements, you might ends up reinventing a portion of tcp/ip's wheel and b) keep in mind that certain networks (corporate firewalls) tend to block UDP/IP while allowing TCP/IP traffic and c) even UDP/IP won't be exact real-time.