Implementing websocket support for HTTP/2 proxies - websocket

I am developing an https http/2 proxy server as mentioned here:
https://chromium.googlesource.com/chromium/src/+/HEAD/net/docs/proxy.md#HTTPS-proxy-scheme
It is mentioned there that
HTTPS proxies using HTTP/2 can offer better performance in Chrome than a regular HTTP proxy due to higher connection limits (HTTP/1.1 proxies in Chrome are limited to 32 simultaneous connections across all domains).
But when users try to surf a website which is using websocket over raw http connection, the response contains 'Upgrade' http header which is forbidden to be used in http/2 as there is no websockets for HTTP/2.
https://daniel.haxx.se/blog/2016/06/15/no-websockets-over-http2/#:~:text=There%20is%20no%20websockets%20for%20HTTP%2F2.&text=That%20spec%20details%20how%20a,connection%20into%20a%20websockets%20connection.
The problem comes from the fact that when http1.1 traffic passes through an https proxy which implements http/2, headers must be transferred from http1.1 to http/2. Of course when the webpage is using TLS, there is no such a problem as all traffic passes through a connection made by http 'CONNECT' method. The problem occurs when the website does not use TLS.
If there is no solution to this problem, it means that HTTPS proxies should not implement HTTP/2 protocol.

As there is no websockets for HTTP/2
The link you posted is old, and since then WebSocket over HTTP/2 has been specified by RFC 8441.
The behavior of a HTTP/2 proxy is specified in RFC 7540, section 8.3.
When the client communicates with the proxy using secure HTTP/2, each HTTP/2 stream is a "tunnel" to the server.
Client and proxy communicate using secure HTTP/2, and the "tunnelling" happens because each HTTP/2 stream becomes an "infinite" request with an "infinite" response, i.e. an "infinite" series of HTTP/2 DATA frames in both directions that carry an opaque byte array payload.
The job of the proxy is to unwrap the DATA frames received by the client, and forward the byte array payload to the server, perhaps re-wrapping it into an HTTP/2 DATA frame if the communication between proxy and server is also HTTP/2 (it may be possible to optimize away the unwrapping and re-wrapping but it may be tricky -- for example the stream numbering could be different).
When a client attempts to perform WebSocket over HTTP/2, the browser will do as specified by RFC 8441, and the proxy will receive a CONNECT request with a :protocol pseudo-header, and the proxy will have to know what to do in that case, depending on what protocol it uses to communicate with the server.
What above describes what your proxy need to support when the communication with the client is HTTP/2.
If your proxy needs to support clients that speak HTTP/1.1, then you need to implement what is required for a proxy to support HTTP/1.1 and WebSocket proxying, and it's typically not difficult: the first is just HTTP/1.1 forwarding or HTTP CONNECT "tunnelling", the second is WebSocket HTTP Upgrade request forwarding followed by a "tunnel" connection or even a fully fledged WebSocket proxy.
Disclaimer, I have implemented all of the above in the Jetty Project.
You can use Jetty 10 or later to implement an HTTP/1.1, HTTP/2, or WebSocket, both client and server (and therefore a proxy). WebSocket over HTTP/2 is supported too.
Finally, to answer your last question, it is perfectly possible to have secure proxies support HTTP/2, even in presence of WebSocket.
For example, a clear-text WebSocket connection to a server starts with an HTTP upgrade request; this request will be sent, encrypted, to the proxy which will decrypt it and forward it to the server (using any protocol the server supports -- could even be WebSocket over secure HTTP/2); the server will reply with a 101 and upgrade the connection to WebSocket; the proxy will receive the 101 from the server and upgrade to a "tunnel" or to a WebSocket proxy; the proxy encrypts the 101 response and forward it to the client; the client decrypts it and upgrades the connection to WebSocket.
At this point, the client is speaking clear-text WebSocket to the server through a secure connection with the proxy (the proxy sees the clear-text WebSocket communication).
Viceversa, a secure WebSocket connection to a server starts with establishing a HTTP CONNECT tunnel through the proxy to the server; then the client will establish the secure communication with the server; then it will send the HTTP upgrade to WebSocket to the server (through the proxy tunnel).
At this point, the client is speaking secure WebSocket to the server through a secure connection with the proxy (the proxy cannot see the WebSocket communication).
The HTTP/2 variants are similar, just taking into account the HTTP/2 specific ways of "tunnelling" and transporting WebSocket.

Related

HTTPS over Socks5 server implementation

I am trying to implement a Socks5 server that could relay both HTTP and HTTPS traffic.
As the RFC1928 mentions, the following steps to establish a connection and forward the data must be taken :
Client sends a greeting message to the proxy.
Client & proxy authentication (assuming it is successful).
Client sends a request to the proxy to connect to the destination.
The proxy connects to the destination and sends back a response to the client to indicate a successful open tunnel.
The proxy reads the data from the client and forwards it to the destination.
The proxy reads the data from the destination and forwards it to the client.
So far, the proxy works as it should. It is able to relay HTTP traffic using its basic data forwarding mechanism. However, any request from the client to an HTTPS website will be aborted because of SSL/TLS encryption.
Is there another sequence/steps that should be followed to be able to handle SSL/TLS (HTTPS) traffic?
The sequence you have described is correct, even for HTTPS. When the client wants to send a request to an HTTPS server through a proxy, it will request the proxy to connect to the target server's HTTPS port, and then once the tunnel is established, the client will negotiate a TLS handshake with the target server, then send an (encrypted) HTTP request and receive an (encrypted) HTTP response. The tunnel is just a passthrough of raw bytes, the proxy has no concept of any encryption between the client and server. It doesn't care what the bytes represent, its job is just to pass them along as-is.

How do websockets work in detail?

There's a fantastic answer which goes into detail as to how REST apis work.
How do websockets work in the similar detail?
Websockets create and represent a standard for bi-directional communication between a server and client. This communication channel creates a TCP connection which is outside of HTTP and is run on a seperate server.
To start this process a handshake is performed between the server and client.
Here is the work flow
1) The user makes an HTTP request to the server with an upgrade header, indicating that the client wishes to establish a WebSocket connection.
2) If the server uses the WebSocket protocol, then it will accept the upgrade and send a response back.
3) With the handshake finished, the WebSocket protocol is used from now on. All communications will use the same underlying TCP port. The new returning status code, 101, signifies Switching Protocols.
As part of HTML5 it should work with most modern browsers.

How does a browser know if a site supports HTTP/2?

If I type out https://http2.golang.org/ the chrome browser will automatically send the HTTP/2 request. How is this done?
Take stackoverflow for example, when the browser sends a request to stackoverflow.com, it has to do the following steps:
DNS lookup. find the ip address of stackoverflow.
TCP/IP handshake
TLS handshake.
HTTP request/response (Application Protocol).
....
TLS handshake
Regarding step3 TLS handshake, there is an nice explanation by #Oleg.
In order to inspect the detail of TCP/IP packet, You may need use some tools to capture packets. e.g. Wireshark.
Client sends ClientHello to server, which carries several things
supported cipher suite. which cipher suites do you like?
supported TLS version.
a random number.
the supported Application Protocols. e.g. HTTP/2, HTTP 1.1/ Spdy/..
...
Server responds SeverHello, which carries
chosen cipher suite.
chosen TLS version.
a random generated number
And, chosen application protocols in TLS application-layer protocol negotiation (ALPN), e.g. HTTP/2
Conclusion
HTTP2 request/response happens in step4. Before that, browser has already know whether sever support HTTP/2 through TLS handshake.
The chrome browser will only send a HTTP/1.1 Request to the website. As the website is HTTP/2 Enabled, it will send a message to the browser that it supports HTTP/2. The server upgrades the communication protocol between it and the server to HTTP/2 if it finds the browser capable of recognizing HTTP/2.
So, it is generally the server which converts a request to the HTTP/2 Connection. The browser just complies with the upgrade policy of the server.
The chrome browser displays that you have a HTTP/2 connection with the server or website, only after the server upgrades the communication protocol.
The string "h2" identifies the protocol where HTTP/2 uses Transport Layer >Security (TLS) [TLS12].
This identifier is used in the TLS application-layer protocol negotiation (ALPN) >extension [TLS-ALPN] field and in any place where HTTP/2 over TLS is identified.
If server support http2.0 browser will find that server is support http2.0 in TLS application-layer protocol negotiation.
refer link!

What if an HTTP/1.1 client talk to an HTTP/2 only server and what if an HTTP/2 client talk to an HTTP/1.1 only server?

HTTP/2 is definitely the future trend because it is now the standard of HTTP protocol. As we can see in Can I use, 70.15 percent of browsers support the HTTP/2. But HTTP/2 is so new that there are browsers that only support HTTP/1.x and there are many servers that only support HTTP/1.x. I knew that a client can use HTTP upgrade mechanism to negotiate a proper protocol to communicate with the server. For example, if the server supports HTTP/2, their communicating protocol will switch to HTTP/2, otherwise, HTTP/1.x is used. But this only applies to the situdation where the browser the clients used supports both HTTP/2 and HTTP/1.x, right?
But what if a user on a browser that only supports HTTP/1.x wants to communicate with HTTP/2 only server? Will the server ignore the request or send an error back to the user?
And what if a user on a brower that only supports HTTP/2 wants to communicate with HTTP/1.1 only server? I am thinking the process might go like this: The user sends a connecion preface to the server, the server cannot recognize the request, so the user might receive a connection error message. Is this right?
Or is there any browser that supports only HTTP/2?
It's important to take in consideration that the most implementations of HTTP/2 uses it over TLS 1.2 with ALPN protocol (Application Layer Protocol Negotiation). Thus the client just start the standard TLS connection. As the part of such communication the client sends "Client Hello" to the server with some information:
It's like: "Hi, Tom! It's Bob. I speak German, Russian and English. Let's talk a little". And the server send "Server Hello":
"Hi, Bob! I suggest to speak German or English". Then the client send one more short message "OK, then let's speak German" and he start to speak German without waiting of any response from the server:
The whole communication looks like on the picture below
Because both the client and the server start the communication just using TLS 1.2, which the both know. They start the main communication after the protocol negotiation. Thus the problem which you describe could not exist in the practice.
If a browser only supports HTTP/1.1 and the server only supports HTTP/2, they cannot communicate. The server will not recognize what the client sends (in particular there will be no connection preface, which the server treats - following the specification - as a connection error), and will close the connection.
"A browser that only supports HTTP/2" does not exist; if they support HTTP/2, they also support HTTP/1.1. But let's assume that such browser exist.
In this latter case, the server will see the connection preface and will not recognize the PRI method. What exactly the server does in this case depends on the server. It may return a 400 Bad Request, or perhaps just close the connection, or it may trigger an internal server error.
I've tried to visit a http2 only server with curl --http1.1 -i, here is what I got
HTTP/1.0 403 Forbidden
Content-Type: text/plain
Unknown ALPN Protocol, expected `h2` to be available.
If this is a HTTP request: The server was not configured with the `allowHTTP1` option or a listener for the `unknownProtocol` event.

Why don't current websocket client implementations support proxies?

A Web Socket detects the presence of a proxy server and automatically sets up a tunnel to pass through the proxy. The tunnel is established by issuing an HTTP CONNECT statement to the proxy server, which requests for the proxy server to open a TCP/IP connection to a specific host and port. Once the tunnel is set up, communication can flow unimpeded through the proxy. Since HTTP/S works in a similar fashion, secure Web Sockets over SSL can leverage the same HTTP CONNECT technique. [1]
OK, sounds useful! But, in the client implementations I've seen thus far (Go [2], Java [3]) I do not see anything related to proxy detection.
Am I missing something or are these implementations just young? I know WebSockets is extremely new and client implementations may be equally young and immature. I just want to know if I'm missing something about proxy detection and handling.
[1] http://www.kaazing.org/confluence/display/KAAZING/What+is+an+HTML+5+WebSocket
[2] http://golang.org/src/pkg/websocket/client.go
[3] http://github.com/adamac/Java-WebSocket-client/raw/master/src/com/sixfire/websocket/WebSocket.java
Let me try to explain the different success rates you may have encountered. While the HTML5 Web Socket protocol itself is unaware of proxy servers and firewalls, it features an HTTP-compatible handshake so that HTTP servers can share their default HTTP and HTTPS ports (80 and 443) with a Web Sockets gateway or server.
The Web Socket protocol defines a ws:// and wss:// prefix to indicate a WebSocket and a WebSocket Secure connection, respectively. Both schemes use an HTTP upgrade mechanism to upgrade to the Web Socket protocol. Some proxy servers are harmless and work fine with Web Sockets; others will prevent Web Sockets from working correctly, causing the connection to fail. In some cases additional proxy server configuration may be required, and certain proxy servers may need to be upgraded to support Web Sockets.
If unencrypted WebSocket traffic flows through an explicit or a transparent proxy server on its way the WebSocket server, then, whether or not the proxy server behaves as it should, the connection is almost certainly bound to fail today (in the future, proxy servers may become Web Socket aware). Therefore, unencrypted WebSocket connections should be used only in the simplest topologies.
If encrypted WebSocket connection is used, then the use of Transport Layer Security (TLS) in the Web Sockets Secure connection ensures that an HTTP CONNECT command is issued when the browser is configured to use an explicit proxy server. This sets up a tunnel, which provides low-level end-to-end TCP communication through the HTTP proxy, between the Web Sockets Secure client and the WebSocket server. In the case of transparent proxy servers, the browser is unaware of the proxy server, so no HTTP CONNECT is sent. However, since the wire traffic is encrypted, intermediate transparent proxy servers may simply allow the encrypted traffic through, so there is a much better chance that the WebSocket connection will succeed if Web Sockets Secure is used. Using encryption, of course, is not free, but often provides the highest success rate.
One way to see it in action is to download and install the Kaazing WebSocket Gateway--a highly optimized, proxy-aware WebSocket gateway, which provides native WebSocket support as well as a full emulation of the standard for older browsers.
The answer is that these clients simply do not support proxies.
-Occam
The communication channel is already established by the time the WebSocket protocol enters the scene. The WebSocket is built on top of TCP and HTTP so you don't have to care about the things already done by these protocols, including proxies.
When a WebSocket connection is established it always starts with a HTTP/TCP connection which is later "upgraded" during the "handshake" phase of WebSocket. At this time the tunnel is established so the proxies are transparent, there's no need to care about them.
Regarding websocket clients and transparent proxies,
I think websocket client connections will fail most of the time for the following reasons (not tested):
If the connection is in clear, since the client does not know it is communicating with a http proxy server, it won't send the "CONNECT TO" instruction that turns the http proxy into a tcp proxy (needed for the client after the websocket handshake). It could work if the proxy supports natively websocket and handles the URL with the ws scheme differently than http.
If the connection is in SSL, the transparent proxy cannot know to which server it should connect to since it has decrypt the host name in the https request. It could by either generating a self-signed certificate on the fly (like for SSLStrip) or providing its own static certificate and decrypt the communication but if the client validates the server certificate it will fail (see https://serverfault.com/questions/369829/setting-up-a-transparent-ssl-proxy).
You mentioned Java proxies, and to respond to that I wanted to mention that Java-Websocket now supports proxies.
You can see the information about that here: http://github.com/TooTallNate/Java-WebSocket/issues/88
websocket-client, a Python package, supports proxies, at the very least over secure scheme wss:// as in that case proxy need no be aware of the traffic it forwards.
https://github.com/liris/websocket-client/commit/9f4cdb9ec982bfedb9270e883adab2e028bbd8e9

Resources