How do websockets work in detail? - websocket

There's a fantastic answer which goes into detail as to how REST apis work.
How do websockets work in the similar detail?

Websockets create and represent a standard for bi-directional communication between a server and client. This communication channel creates a TCP connection which is outside of HTTP and is run on a seperate server.
To start this process a handshake is performed between the server and client.
Here is the work flow
1) The user makes an HTTP request to the server with an upgrade header, indicating that the client wishes to establish a WebSocket connection.
2) If the server uses the WebSocket protocol, then it will accept the upgrade and send a response back.
3) With the handshake finished, the WebSocket protocol is used from now on. All communications will use the same underlying TCP port. The new returning status code, 101, signifies Switching Protocols.
As part of HTML5 it should work with most modern browsers.

Related

How does browsers handle DNS lookup and TLS with WebSockets?

I'm currently considering the option of implementing websockets in my application. But before doing so, I want to make sure I understand correctly how it works and if it's gonna be worth it.
I understand the basics: Via WebSockets the handshake will be made only once via HTTP and then will talk to the server in order to switch to a lower level TCP layer, at that point, we have a full-duplex channel between the server and the client.
Currently I'm measuring the ajax requests made to my server (which are many), I've got this information:
The "DNS Lookup", "Initial connection" and "SSL" times are what I'm trying to eliminate (if possible)
For my understanding these times are part of the handshaking process and I'm assuming that using websockets it will happen only on the beginning (the handshake), but I'm not sure.
So my question is: Am I correct? Implementing WebSockets will ensure that the "DNS lookup" and the "Initial connection" steps happen only on the handshake?
Thanks in advance for your help, and sorry if my understanding is wrong.
I understand the basics: Via WebSockets the handshake will be made only once via HTTP and then will talk to the server in order to switch to a lower level TCP layer, at that point, we have a full-duplex channel between the server and the client.
It will not switch to a lower level TCP layer. Instead it will switch the protocol from plain HTTP (request, response) to a message based protocol - which like HTTP is on top of TCP at the application layer and not at a lower level. It's just a different protocol on the same level. It behaves a bit like TCP in that you can send and receive messages at any time without being restricted to request/response scheme of HTTP. But for example TCP is a data stream while WebSockets implements a message oriented protocol.
And, DNS is outside of WebSockets. DNS is only needed to lookup the IP address to establish the TCP connection which then gets used to do the initial HTTP handshake which is needed before the protocol switch to WebSockets.
The situation is similar with TLS. After the DNS lookup to get the IP address the TCP connection is established, then the TLS session on top of the TCP connection is established and then the initial HTTP handshake preceding the switch to WebSockets is done: i.e. HTTP inside a TLS tunnel inside a TCP connection - in other words HTTPS. The WebSocket protocol is then also spoken inside this TLS tunnel, similar how it is done with HTTP.
So my question is: Am I correct? Implementing WebSockets will ensure that the "DNS lookup" and the "Initial connection" steps happen only on the handshake?
Correct. At the beginning of each ws:// connection you might have a DNS lookup if the IP for the name is not already cached. You will have a TCP handshake and you will have a HTTP handshake which then leads to the protocol switch. This is true for every ws:// connection. And for wss:// there is additionally the creation of the TLS tunnel after the TCP connection was established and before the HTTP handshake gets started.

How does WebSockets server architecture work?

I'm trying to get a better understanding of how the server-side architecture works for WebSockets with the goal of implementing it in an embedded application. It seems that there are 3 different server-side software components in play here: 1) the web server to serve static HTTP pages and handle upgrade request, 2) a WebSockets library such as libwebsockets to handle the "nuts and bolts" of WebSockets communications, and 3) my custom application to actually figure out what to do with incoming data. How do all these fit together? Is it common to have a separate web server and WebSocket handling piece, aka a WebSocket server/daemon?
How does my application communicate with the web server and/or WebSockets library to send/receive data? For example, with CGI, the web server uses environmental variables to send info to the custom application, and stdout to receive responses. What is the equivalent communication system here? Or do you typically link in a WebSocket library into the customer application? But then how would communication with the web server to the WebSocket library + custom application work? Or all 3 combined into a single component?
Here's why I am asking. I'm using the boa web server on a uClinux/no MMU platform on a Blackfin processor with limited memory. There is no native WebSocket support in boa, only CGI. I'm trying to figure out how I can add WebSockets support to that. I would prefer to use a compiled solution as opposed to something interpreted such as JavaScript, Python or PHP. My current application using long polling over CGI, which does not provide adequate performance for planned enhancements.
First off, it's important to understand how a webSocket connection is established because that plays into an important relationship between webSocket connections and your web server.
Every webSocket connection starts with an HTTP request. The browser sends an HTTP request to the host/port that the webSocket connection is requested on. That request might look something like this:
GET /chat HTTP/1.1
Host: example.com:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
What distinguishes this request from any other HTTP request to that server is the Upgrade: websocket header in the request. This tells the HTTP server that this particular request is actually a request to initiate a webSocket connection. This header also allows the web server to tell the difference between a regular HTTP request and a request to open a webSocket connection. This allows something very important in the architecture and it was done this way entirely on purpose. This allows the exact same server and port to be used for both serving your web requests and for webSocket connections. All that is needed is a component on your web server that looks for this Upgrade header on all incoming HTTP connections and, if found, it takes over the connection and turns it into a webSocket connection.
Once the server recognizes this upgrade header, it responds with a legal HTTP response, but one that signals the client that the upgrade to the webSocket protocol has been accepted that looks like this:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
At that point, both client and server keep that socket from the original HTTP request open and both switch to the webSocket protocol.
Now, to your specific questions:
How does my application communicate with the web server and/or
WebSockets library to send/receive data?
Your application may use the built-in webSocket support in modern browsers and can initiate a webSocket connection like this:
var socket = new WebSocket("ws://www.example.com");
This will instruct the browser to initiate a webSocket connection to www.example.com use the same port that the current web page was connected with. Because of the built-in webSocket support in the browser, the above HTTP request and upgrade protocol is handled for you automatically from the client.
On the server-side of things, you need to make sure you are using a web server that has incoming webSocket support and that the support is enabled and configured. Because a webSocket connection is a continuous connection once established, it does not really follow the CGI model at all. There must be at least one long-running process handling live webSocket connections. In server models (like CGI), you would need some sort of webServer add-on that supports this long-running process for your webSocket connections. In a server environment like node.js which is already a long running process, the addition of webSockets is no change at all architecturally - but rather just an additional library to support the webSocket protocol.
I'd suggest you may find this article interesting as it discussions this transition from CGI-style single request handling to the continuous socket connections of webSocket:
Web Evolution: from CGI to Websockets (and how it will help you better monitor your cloud infrastructure)
If you really want to stick with the stdin/stdout model, there are libraries that model that for your for webSockets. Here's one such library. Their tagline is "It's like CGI, twenty years later, for WebSockets".
I'm trying to figure out how I can add WebSockets support to that. I
would prefer to use a compiled solution as opposed to something
interpreted such as JavaScript, Python or PHP.
Sorry, but I'm not familiar with that particular server environment. It will likely take some in-depth searching to find out what your options are. Since a webSocket connection is a continuous connection, then you will need a process that is running continuously that can be the server-side part of the webSocket connection. This can either be something built into your webServer or it can be an additional process that the webServer starts up and forwards incoming connections to.
FYI, I have a custom application at home here built on a Raspberry Pi that uses webSockets for real-time communication with browser web pages and it works just fine. I happen to be using node.js for the server environment and the socket.io library that runs on top of webSockets to give me a higher level interface on top of webSockets. My server code checks several hardware sensors on a regular interval and then whenever there is new/changed data to report, it sends messages down any open webSockets so the connected browsers get real-time updates on the sensor readings.
You would likely need some long-running application that incoming webSocket connections were passed from the web server to your long running process or you'd need to make the webSocket connections on a different port than your web server (so they could be fielded by a completely different server process) in which case you'd have a whole separate server to handle your webSocket requests and sockets (this server would also have to support CORS to enable browsers to connect to it since it would be a different port than your web pages).

Does the connection get closed at any point during the WebSocket handshake or immediately after?

According to the Wikipedia article: http://en.wikipedia.org/wiki/WebSocket,
The server sends back this response to the client during handshake:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk=
Sec-WebSocket-Protocol: chat
Does this close the connection (as HTTP responses usually do) or it is kept open throughout the entire handshake and it can start sending WebSocket frames straight away (assuming that it succeeds)?
An HTTP socket going through the handshake process to be upgraded to the webSocket protocol is not closed during that process. The same open socket goes through the whole process and then becomes the socket used for the webSocket protocol. As soon as the upgrade is complete, that very socket is ready for messages to be sent per the webSocket protocol.
It is this use of the exact same socket that enables a webSocket connection to run on the same port as an HTTP request (no extra port is needed) because it literally starts out as an HTTP request (with some extra headers attached) and then when those headers are recognized and both sides agree, the socket from that original HTTP request on the original web port (often port 80) is then switched to use the webSocket protocol. No additional connection on some new port is needed.
I actually find it a relatively elegant design because it makes for easy coexistence with a web server which was an important design parameter. And, a slight extra bit of connection overhead (protocol upgrade negotiation) is generally not an issue because webSocket connections by their very nature are designed to be long running sockets which you open once and use over an extended period of time so a little extra overhead to open them doesn't generally bother their use.
If, for any reason, the upgrade is not completed (both sides don't agree on the upgrade to webSocket), then the socket would remain an HTTP socket and would behave as HTTP sockets normally do (likely getting closed right away, but subject to normal HTTP interactions).
You can see this answer for more details on the back and forth during an upgrade to webSocket: SocketIO tries to connect using same port as the browser used to get web page

ruby HttpClient library closing socket after response with persistent connection?

I'm using the HTTPClient gem (http://github.com/nahi/httpclient) for ruby, to post data to IIS 6.1. Even though both support HTTP 1.1 it seems to be closing the socket after each request made, rather than using persistent connections. I haven't added any flags to enable persistent connections (mainly because having poked about the source code it appears that they should be enabled by default).
The reason I think the socket is being close is that if I watch the requests in Wireshark once each request is made I see FIN/ACK TCP packets sent from the client to the server, then the same sent back the other way.
Am I misreading that or does that mean the socket is being closed?
Wikipedia's article on TCP suggests that the FIN/ACK packets are a signal to terminate the connection. Check which of the client or server initiated the sending of the FIN packet - that's the party requesting that the connection is closed.
As you saw in the source, an HTTP 1.1 implementation should assume that connections are persistent by default.
Is the client specifying HTTP 1.1 in its request and is the server responding accordingly?

Why don't current websocket client implementations support proxies?

A Web Socket detects the presence of a proxy server and automatically sets up a tunnel to pass through the proxy. The tunnel is established by issuing an HTTP CONNECT statement to the proxy server, which requests for the proxy server to open a TCP/IP connection to a specific host and port. Once the tunnel is set up, communication can flow unimpeded through the proxy. Since HTTP/S works in a similar fashion, secure Web Sockets over SSL can leverage the same HTTP CONNECT technique. [1]
OK, sounds useful! But, in the client implementations I've seen thus far (Go [2], Java [3]) I do not see anything related to proxy detection.
Am I missing something or are these implementations just young? I know WebSockets is extremely new and client implementations may be equally young and immature. I just want to know if I'm missing something about proxy detection and handling.
[1] http://www.kaazing.org/confluence/display/KAAZING/What+is+an+HTML+5+WebSocket
[2] http://golang.org/src/pkg/websocket/client.go
[3] http://github.com/adamac/Java-WebSocket-client/raw/master/src/com/sixfire/websocket/WebSocket.java
Let me try to explain the different success rates you may have encountered. While the HTML5 Web Socket protocol itself is unaware of proxy servers and firewalls, it features an HTTP-compatible handshake so that HTTP servers can share their default HTTP and HTTPS ports (80 and 443) with a Web Sockets gateway or server.
The Web Socket protocol defines a ws:// and wss:// prefix to indicate a WebSocket and a WebSocket Secure connection, respectively. Both schemes use an HTTP upgrade mechanism to upgrade to the Web Socket protocol. Some proxy servers are harmless and work fine with Web Sockets; others will prevent Web Sockets from working correctly, causing the connection to fail. In some cases additional proxy server configuration may be required, and certain proxy servers may need to be upgraded to support Web Sockets.
If unencrypted WebSocket traffic flows through an explicit or a transparent proxy server on its way the WebSocket server, then, whether or not the proxy server behaves as it should, the connection is almost certainly bound to fail today (in the future, proxy servers may become Web Socket aware). Therefore, unencrypted WebSocket connections should be used only in the simplest topologies.
If encrypted WebSocket connection is used, then the use of Transport Layer Security (TLS) in the Web Sockets Secure connection ensures that an HTTP CONNECT command is issued when the browser is configured to use an explicit proxy server. This sets up a tunnel, which provides low-level end-to-end TCP communication through the HTTP proxy, between the Web Sockets Secure client and the WebSocket server. In the case of transparent proxy servers, the browser is unaware of the proxy server, so no HTTP CONNECT is sent. However, since the wire traffic is encrypted, intermediate transparent proxy servers may simply allow the encrypted traffic through, so there is a much better chance that the WebSocket connection will succeed if Web Sockets Secure is used. Using encryption, of course, is not free, but often provides the highest success rate.
One way to see it in action is to download and install the Kaazing WebSocket Gateway--a highly optimized, proxy-aware WebSocket gateway, which provides native WebSocket support as well as a full emulation of the standard for older browsers.
The answer is that these clients simply do not support proxies.
-Occam
The communication channel is already established by the time the WebSocket protocol enters the scene. The WebSocket is built on top of TCP and HTTP so you don't have to care about the things already done by these protocols, including proxies.
When a WebSocket connection is established it always starts with a HTTP/TCP connection which is later "upgraded" during the "handshake" phase of WebSocket. At this time the tunnel is established so the proxies are transparent, there's no need to care about them.
Regarding websocket clients and transparent proxies,
I think websocket client connections will fail most of the time for the following reasons (not tested):
If the connection is in clear, since the client does not know it is communicating with a http proxy server, it won't send the "CONNECT TO" instruction that turns the http proxy into a tcp proxy (needed for the client after the websocket handshake). It could work if the proxy supports natively websocket and handles the URL with the ws scheme differently than http.
If the connection is in SSL, the transparent proxy cannot know to which server it should connect to since it has decrypt the host name in the https request. It could by either generating a self-signed certificate on the fly (like for SSLStrip) or providing its own static certificate and decrypt the communication but if the client validates the server certificate it will fail (see https://serverfault.com/questions/369829/setting-up-a-transparent-ssl-proxy).
You mentioned Java proxies, and to respond to that I wanted to mention that Java-Websocket now supports proxies.
You can see the information about that here: http://github.com/TooTallNate/Java-WebSocket/issues/88
websocket-client, a Python package, supports proxies, at the very least over secure scheme wss:// as in that case proxy need no be aware of the traffic it forwards.
https://github.com/liris/websocket-client/commit/9f4cdb9ec982bfedb9270e883adab2e028bbd8e9

Resources