How to make the client of proxy server keep alive? - proxy

I want to make the client of proxy server keepAlive. Thus, I don't want the proxy client to make a tcp close handshake everytime.
Please look at this example in netty.
Adding the keepAlive option to this example doesn't seem to work properly. Because it makes a client and connect everytime the server get request and close the client when the response is arrived.
Then how can I make my proxy client keepAlive? Is there any reference/example for it?

Using SO_KEEPALIVE socket option doesn't mean that the server (or the other peer in the connection) should ignore an explicit request to close the connection. It helps in cases like
Idle sessions timing-out/getting killed by the other end due to non-activity
Idle or long-running requests being disconnected by a firewall in-between after a certain time passes (e.g. 1 hour, for resource clean-up purposes).
If the client's logic is not to re-use the same socket for different requests (i.e. if its application logic uses a new socket for each request), there's nothing you can do about that on your proxy.
The same argument is valid for the "back-end" side of your proxy as well. If the server you're proxying to doesn't allow the socket to be re-used, and explicitly closes a connection after a request is served, that wouldn't work as you wanted either.

If you are not closing the connection on your side then the proxy is. Different proxy servers will behave in different ways.
Try sending Connection: Keep-Alive as a header.
If that doesn't work, try also sending Proxy-Connection: Keep-Alive as a header.

Related

Does the protocol used by HTTP proxies reduce the number of connections negotiated by the client?

When an HTTP proxy server is used, is the number of connections negotiated between the client and the proxy reduced as compared to the client connecting directly to various http sites directly (without proxy)?
For example, when connecting directly to two different domains, it is clear that at least two connections must be made. In the case of a proxy, does the client usually use a single connect to the proxy for both "connections"?
Similarly, are there cases where a client that connecting to a single domain but accessing several resources would see a reduced number of connections using a proxy? E.g., can the proxy present a HTTP/1.1-style persistent connect even when the ultimate destination doesn't support it? Are proxies able to use longer persistent connection timeout periods?
In the case of a proxy, does the client usually use a single connect to the proxy for both "connections"?
While it would possible to use the same connection to a HTTP proxy to include HTTP requests to different targets most clients don't do it from what I've seen. Also, it would only work with HTTP and not HTTPS since in the latter case the whole TLS connection to the target is tunneled through the proxy and the close of this tunneled connection is also the close of the underlying TCP connection to the proxy. And, HTTP requests to multiple targets would only be possible with a HTTP proxy but not a SOCKS proxy since SOCKS essentially builds a tunnel to a specific target and this target is set at the beginning of the connection and can never be changed.
That said, while I've not seen it for browser to proxy connections I've seen a patched squid used (long ago) to do this in order to optimize proxy to proxy connections.
E.g., can the proxy present a HTTP/1.1-style persistent connect even when the ultimate destination doesn't support it?
While this would be possible too it is also not common. Usually the proxy does not fully decouple client and server, i.e. a server-triggered close of the connection between server and proxy usually results in close of the connection between proxy and client too. The reason is probably that it would work for only for HTTP anyway and not HTTPS and that it makes the implementation of the proxy more complex since things like repeating a request on sudden close of a persistent connection by the server between requests would now need to be handled by the proxy instead of simply forwarding the close and let the client deal with it.

HAProxy is not load balancing due to persistent connections

We have a web server and a client, both written in go, that interact with each other. We want HAProxy to load balance requests between several instance of the server, but it's not working. The client will always connect to the same server while it's still up.
If I look at the output of "netstat -anp", I can see that there is a persistent connection that was established between the client and the sever through HAProxy. I tried setting the Connection Header in the response to "close", but that didn't work at all.
Needless to say, I'm completely confused by this. My first question is, is this a problem with the client, server, or HAProxy? How does one force the client to disconnect? Am I missing something regarding this? Curl works fine, so I know that HAProxy does load balance, but curl also completely shuts down when finished, hence why I'm suspecting it's the persistent connection that's causing me issues since the client and server are long running.
Just as an FYI, I'm using go-martini on the server.
Thanks.
HTTP/1.1 uses KeepAlive by default. Since the connections aren't closed, HAProxy can't balance the requests between different backends.
You have a couple options:
Force the connection to close after each request in your code. Setting Request.Close = true on either the client or the server will send a Connection: close header, telling both sides to close the tcp connection.
Alternatively you could have HAPoxy alter the requests by setting http-server-close so the backend is closed after each request, or http-closeto shutdown both sides after each request.
http-server-close is usually the best option, since that still maintains persistent connections for the client, while proxying each request individually.

Does the connection get closed at any point during the WebSocket handshake or immediately after?

According to the Wikipedia article: http://en.wikipedia.org/wiki/WebSocket,
The server sends back this response to the client during handshake:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk=
Sec-WebSocket-Protocol: chat
Does this close the connection (as HTTP responses usually do) or it is kept open throughout the entire handshake and it can start sending WebSocket frames straight away (assuming that it succeeds)?
An HTTP socket going through the handshake process to be upgraded to the webSocket protocol is not closed during that process. The same open socket goes through the whole process and then becomes the socket used for the webSocket protocol. As soon as the upgrade is complete, that very socket is ready for messages to be sent per the webSocket protocol.
It is this use of the exact same socket that enables a webSocket connection to run on the same port as an HTTP request (no extra port is needed) because it literally starts out as an HTTP request (with some extra headers attached) and then when those headers are recognized and both sides agree, the socket from that original HTTP request on the original web port (often port 80) is then switched to use the webSocket protocol. No additional connection on some new port is needed.
I actually find it a relatively elegant design because it makes for easy coexistence with a web server which was an important design parameter. And, a slight extra bit of connection overhead (protocol upgrade negotiation) is generally not an issue because webSocket connections by their very nature are designed to be long running sockets which you open once and use over an extended period of time so a little extra overhead to open them doesn't generally bother their use.
If, for any reason, the upgrade is not completed (both sides don't agree on the upgrade to webSocket), then the socket would remain an HTTP socket and would behave as HTTP sockets normally do (likely getting closed right away, but subject to normal HTTP interactions).
You can see this answer for more details on the back and forth during an upgrade to webSocket: SocketIO tries to connect using same port as the browser used to get web page

SocketIO tries to connect using same port as the browser used to get web page

I am serving content locally, accessible through http://0.0.0.0:4000. That works ok, I get a correct webpage, which contains the following line inside a script:
var socket = io('http://example.com');
i.e. I am referencing an external server. Now my browser shows the followoing error:
GET http://example.com:4000/socket.io/?EIO=3&transport=polling&t=1417447089410-1 net::ERR_CONNECTION_REFUSED
That is, the browser is trying to connect using the same port that it used to get the original page.
Everything works fine when both the SocketIO server and the web server listen on the same port.
Am I missing something? Is this a bug? Is there a workaround? Thank you.
You can read here about how a plain webSocket is initially set up. It all starts with a somewhat standard HTTP GET request, but one that has some special headers set:
GET /chat HTTP/1.1
Host: example.com:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
The interchange may also allow the host to enforce requests only from web pages on certain origins. While this header can be spoofed from non-web-browser agents (so the server has to be prepared for that), it will likely be correct when the OP is using a real browser (assuming no proxy is modifying it).
If the server accepts the incoming request, it will then return an HTTP response that looks something like this:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
At this point, the socket which used to be an HTTP socket is now a webSocket and both endpoints have agreed that they're going to use the webSocket data format from now on. This initial connection may be followed by some form of authentication or new or existing cookies can also be used in the authentication during the initial HTTP portion of the connection.
socket.io adds some enhancements on top of this by initially requesting a particular path of /socket.io and adding some parameters to the URL. This allows socket.io to negotiate whether it's going to use long polling or a webSocket so there are some exchanges between client/server with socket.io before the above webSocket is initialized.
So, back to your question. The socket.io server simply spies at all incoming web requests on the normal web port (and looks for both it's special path and for special headers to indicate a webSocket initiation rather than a classic HTTP request). So, it runs over the same port as the web server. This is done for a bunch of reasons, all of which provide convenience to the server and server infrastructure since they don't have to configure their network to accept anything other than the usual port 80 they were already accepting (or whatever port they were already using for web requests).
By default in socket.io, the domain and port will default to the same domain and port as the web page you are on. So, if you don't specify one or the other in your connect call, it will use the domain or port from the web page you are on. If you want to use both a different domain and port, then you must specify both of them.

When should one use CONNECT and GET HTTP methods at HTTP Proxy Server?

I'm building a WebClient library. Now I'm implementing a proxy feature, so I am making some research and I saw some code using the CONNECT method to request a URL.
But checking it within my web browser, it doesn't use the CONNECT method but calls the GET method instead.
So I'm confused. When I should use both methods?
TL;DR a web client uses CONNECT only when it knows it talks to a proxy and the final URI begins with https://.
When a browser says:
CONNECT www.google.com:443 HTTP/1.1
it means:
Hi proxy, please open a raw TCP connection to google; any following
bytes I write, you just repeat over that connection without any
interpretation. Oh, and one more thing. Do that only if you talk to
Google directly, but if you use another proxy yourself, instead you
just tell them the same CONNECT.
Note how this says nothing about TLS (https). In fact CONNECT is orthogonal to TLS; you can have only one, you can have other, or you can have both of them.
That being said, the intent of CONNECT is to allow end-to-end encrypted TLS session, so the data is unreadable to a proxy (or a whole proxy chain). It works even if a proxy doesn't understand TLS at all, because CONNECT can be issued inside plain HTTP and requires from the proxy nothing more than copying raw bytes around.
But the connection to the first proxy can be TLS (https) although it means a double encryption of traffic between you and the first proxy.
Obviously, it makes no sense to CONNECT when talking directly to the final server. You just start talking TLS and then issue HTTP GET. The end servers normally disable CONNECT altogether.
To a proxy, CONNECT support adds security risks. Any data can be passed through CONNECT, even ssh hacking attempt to a server on 192.168.1.*, even SMTP sending spam. Outside world sees these attacks as regular TCP connections initiated by a proxy. They don't care what is the reason, they cannot check whether HTTP CONNECT is to blame. Hence it's up to proxies to secure themselves against misuse.
A CONNECT request urges your proxy to establish an HTTP tunnel to the remote end-point.
Usually is it used for SSL connections, though it can be used with HTTP as well (used for the purposes of proxy-chaining and tunneling)
CONNECT www.google.com:443
The above line opens a connection from your proxy to www.google.com on port 443.
After this, content that is sent by the client is forwarded by the proxy to www.google.com:443.
If a user tries to retrieve a page http://www.google.com, the proxy can send the exact same request and retrieve response for him, on his behalf.
With SSL(HTTPS), only the two remote end-points understand the requests, and the proxy cannot decipher them. Hence, all it does is open that tunnel using CONNECT, and lets the two end-points (webserver and client) talk to each other directly.
Proxy Chaining:
If you are chaining 2 proxy servers, this is the sequence of requests to be issued.
GET1 is the original GET request (HTTP URL)
CONNECT1 is the original CONNECT request (SSL/HTTPS URL or Another Proxy)
User Request ==CONNECT1==> (Your_Primary_Proxy ==CONNECT==> AnotherProxy-1 ... ==CONNECT==> AnotherProxy-n) ==GET1(IF is http)/CONNECT1(IF is https)==> Destination_URL
As a rule of thumb GET is used for plain HTTP and CONNECT for HTTPS
There are more details though so you probably want to read the relevant RFC-s
http://www.ietf.org/rfc/rfc2068.txt
http://www.ietf.org/rfc/rfc2817.txt
The CONNECT method converts the request connection to a transparent TCP/IP tunnel, usually to facilitate SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy.

Resources