Any HTTP proxies with explicit, configurable support for request/response buffering and delayed connections? - caching

When dealing with mobile clients it is very common to have multisecond delays during the transmission of HTTP requests. If you are serving pages or services out of a prefork Apache the child processes will be tied up for seconds serving a single mobile client, even if your app server logic is done in 5ms. I am looking for a HTTP server, balancer or proxy server that supports the following:
A request arrives to the proxy. The proxy starts buffering in RAM or in disk the request, including headers and POST/PUT bodies. The proxy DOES NOT open a connection to the backend server. This is probably the most important part.
The proxy server stops buffering the request when:
A size limit has been reached (say, 4KB), or
The request has been received completely, headers and body
Only now, with (part of) the request in memory, a connection is opened to the backend and the request is relayed.
The backend sends back the response. Again the proxy server starts buffering it immediately (up to a more generous size, say 64KB.)
Since the proxy has a big enough buffer the backend response is stored completely in the proxy server in a matter of miliseconds, and the backend process/thread is free to process more requests. The backend connection is immediately closed.
The proxy sends back the response to the mobile client, as fast or as slow as it is capable of, without having a connection to the backend tying up resources.
I am fairly sure you can do 4-6 with Squid, and nginx appears to support 1-3 (and looks like fairly unique in this respect). My question is: is there any proxy server that empathizes these buffering and not-opening-connections-until-ready capabilities? Maybe there is just a bit of Apache config-fu that makes this buffering behaviour trivial? Any of them that it is not a dinosaur like Squid and that supports a lean single-process, asynchronous, event-based execution model?
(Siderant: I would be using nginx but it doesn't support chunked POST bodies, making it useless for serving stuff to mobile clients. Yes cheap 50$ handsets love chunked POSTs... sigh)

What about using both nginx and Squid (client — Squid — nginx — backend)? When returning data from a backend, Squid does convert it from C-T-E: chunked to a regular stream with Content-Length set, so maybe it can normalize POST also.

Nginx can do everything you want. The configuration parameters you are looking for are
http://wiki.codemongers.com/NginxHttpCoreModule#client_body_buffer_size
and
http://wiki.codemongers.com/NginxHttpProxyModule#proxy_buffer_size

Fiddler, a free tool from Telerik, does at least some of the things you're looking for.
Specifically, go to Rules | Custom Rules... and you can add arbitrary Javascript code at all points during the connection. You could simulate some of the things you need with sleep() calls.
I'm not sure this method gives you the fine buffering control you want, however. Still, something might be better than nothing?

Squid 2.7 can support 1-3 with a patch:
http://www.squid-cache.org/Versions/v2/HEAD/changesets/12402.patch
I've tested this and found it to work well, with the proviso that it only buffers to memory, not disk (unless it swaps, of course, and you don't want this), so you need to run it on a box that's appropriately provisioned for your workload.
Chunked POSTs are a problem for most servers and intermediaries. Are you sure you need support? Usually clients should retry the request when they get a 411.

Unfortunately, I'm not aware of a ready-made solution for this. In the worst case scenario, consider developing it yourself, say, using Java NIO -- it shouldn't take more than a week.

Related

Caching proxy for all traffic

I am trying to find (or write) a caching proxy tool that accepts all traffic from a specific container in my localhost (using Iptables). What I want to do with this traffic is to save it and cache the response, and later, if I see that a request was already sent to a server, return the cached response to the requesting party (and not sending the request to the server again, because a previous similar request was already sent).
Here's a diagram to demonstrate what I'm trying to do:
I'm not sure exactly how big is the problem I'm trying to deal with here. I want to do it for all traffic, including HTTP, TLS and other TCP based traffic (database connections and such). I tried to check mitmproxy, and it seems to deal pretty good with HTTP and the TLS part, but intercepting raw TCP traffic (for databases etc.) is not possible.
Any advices or resources I can use to accomplish that? (Not necessarily in Python). How complex do you think this problem is? Do you think I can find a generic solution?
Thanks in advance!

Read_reply in tarantool-c is too slow

I am setting up a c server and used tarantool as databased using tarantool-c. However, everytime I setup read_reply() the request per second tanks so much its like using mysql. How to fix it?
We had the discussion with James and he shares the code. The code implements the http server and this is how it processes a request:
Accept a new incoming http request.
Send a request to tarantool (using binary protocol).
Wait for a reply from tarantool (synchronously, unable to handle other incoming http requests).
Answer to the http request.
The root of the problem here is that we unable to utilize full network bandwith between the http server and tarantool. Such server should use select() / poll() / epoll() (on Linux) / kqueue (on FreeBSD) or a library like libev to determine whether it is able to write to a socket, read from it or accept a request.
Let me describe in brief how it should operate to at least utilize a network as much as possible (when doing it from one thread) in the set of rules of when-X-then-Y kind:
When a new http request arrive it should register the need to send a request to tarantool (let me name it pending request).
When a socket to tarantool is ready for write and there is at least one pending request the server should form a request to tarantool, save its ID (see tnt_stream.reqid) and write the request to the socket.
When the socket to tarantool is ready to read the server should read a reply from tarantool, match its ID (see tnt_reply.sync) against saved one and write a response to an http client, then close the socket to the client.
If http keepalive / pipelining need to be supported, the server needs to check a socket to an http client for readiness to perform read() / write(). Also it need to register pending http responses rather then write it just when they appears.
Aside of that HTTP itself is not easy to implement in a proper way: it always give you surprises if you don't control implementations of both client and server.
So I will describe alternatives to implementing its own http server, that are compatible with tarantool and able to operate with it in asynchronous way to achieve good performance. They are:
Using http.server tarantool module that allows to process http requests right in tarantool w/o external service and w/o using a connector to tarantool.
Using nginx_upstream_tarantool nginx module that allows to perform a request to tarantool from nginx using the binary protocol.
There are cons and pros for both of these ways. However http.server disadvantages can be overcomed with using nginx as frontend to proxying requests to tarantool(s) saving http.server advantages.
http.server cons:
No https support.
Single CPU bound / single tarantool instance bound.
Possibly worse performance then nginx (but not much).
Possibly worse support of HTTP caveats (but I don't know one).
http.server pros:
Simpler to start developing.
Simpler in configuration / deployment: parts of an application configuration do not spread across configs for tarantool and nginx.
nginx_upstream_tarantool cons and pros are reverse of http.server ones. Also I would mention specifically that nginx allows you to balance load across several tarantool instances, which may form a replication group or may be sharding frontends. This ability can be used to scale a service in the sense of desired performance as with proxying to http.server as well as with nginx_upstream_tarantool.
I guess you also interested in benchmarking results for http.server and nginx_upstream_tarantool. Look at this measurement. Note however that it is quite synthetic: it performs small requests and answer with small responses. Real RPS numbers may be different depending of size of requests and responses, hardware, whether https is needed, etc.

High Performance Options for Remote services access

I have a service, foo, running on machine A. I need to access that service from machine B. One way is to launch a web server on A and do it via HTTP; code running under web server on A accesses foo and returns the results. Another is to write socket server on A; socket server access service foo and returns the result.
HTTP connection initiation and handshake is expensive; sockets can be written, but I want to avoid that. What other options are available for high performance remote calls?
HTTP is just the protocol over the socket. If you are using TCP/IP networks, you are going to be using a socket. The HTTP connection initiation and handshake are not the expensive bits, it's TCP initiation that's really the expensive bit.
If you use HTTP 1.1, you can use persistent connections (Keep-Alive), which drastically reduces this cost, closer to that of keeping a persistent socket open.
It all depends on whether you want/need the higher-level protocol. Using HTTP means you will be able to do things like consume this service from a lot more clients, while writing much less documentation (if you write your own protocol, you will have to document it). HTTP servers also supports things like authentication, cookies, logging, out of the box. If you don't need these sorts of capabilities, then HTTP might be a waste. But I've seen few projects that don't need at least some of these things.
Adding to the answer of #Rob, as the question is not precisely pointing to an application or performance boundaries, it would be good to look into the options available in a broader context, which is Inter process communication.
The wikipedia page cleanly lists down the options available and would be a good place to start with.
What technology are you going to use? Let me answer for Java world.
If your request rate is below 100/sec, you should not care about optimizations and use most versatile solution - HTTP.
Well-written asynchronous server like Netty-HTTP can easily handle 1000 requests per second on medium-level hardware.
If you need more or have constrained resources, you can go to binary format. Most popular one out there is Google Protobuf(multilanguage) + Netty (Java).
What you should know about HTTP performance:
Http can use Keep-Alive which removes reconnection cost for every request
Http adds traffic overhead for every request and response - around 50-100 bytes.
Http client and server consumes additional CPU for parsing HTTP headers - that is noticeable after abovementioned 100 req/sec
Be careful when selecting technology. Even in 21 century it is hard to find well-written HTTP server and client.

Understanding HTTPS connection setup overhead

I'm building a web-based chat app which will need to make an AJAX request for every message sent or received. I'd like the data to be encrypted and am leaning towards running AJAX (with long-polling) over HTTPS.
However, since the frequency of requests here is a lot higher than with basic web browsing, I'd like to get a better understanding of the overhead (network usage, time, server CPU, client CPU) in setting up the encrypted connection for each HTTPS request.
Aside from any general info/advice, I'm curious about:
As a very rough approximation, how much extra time does an HTTPS request take compared to HTTP? Assume content length of 1 byte and an average PC.
Will every AJAX request after the first have anything significant cached, allowing it to establish the connection quicker? If so, how much quicker?
Thank you in advance :-)
Everything in HTTPS is slower. Personal information shouldn't be cached, you have encryption on both ends, and an SSL handshake is relatively slow.
Long-polling will help. Long keep-alives are good. Enabling SSL sessions on your server will avoid a lot of the overhead as well.
The real trick is going to be doing load-balancing or any sort of legitimate caching. Not sure how much that will come into play in your system, being a chat server, but it's something to consider.
You'll get more information from this article.
Most of the overhead is in the handshake (exchanging certificates, checking for their revocation, ...). Session resumption and the recent false start extension helps in that respect.
In my experience, the worse case scenario happens when using client-certificate authentication and advertising too many CAs (the CertificateRequest message sent by the server can even become too big); this is quite rare since in practice, when you use client-certificate authentication, you would only accept client-certificates from a limited number of CAs.
If you configure your server properly (for resources for which it's appropriate), you can also enable browser caching for resources served over HTTPS, using Cache-Control: public.

Is this chat using "long polling" or "http streaming"?

Is this chat using "long polling" or "http streaming" ?
http://go-mono.com/moonlight/chat.aspx
It's not anything that simple. It uses http://www.mibbit.com/chat, which is a full IRC client written in Javascript and Java. Blog at http://blog.mibbit.com/.
Edit: Here's your answer.
The first part I got working was the communications between browser and server. That’s done using 2 XMLHttpRequests. The first one is simply to send data from browser to server. It utilizes keep-alive, to minimise new connections.
The second XHR is the ‘receive lazy polling’ one. It connects to the server, and the server holds it open until there are messages available, or a timeout expires. This one is also keep-alive, so the next request goes down the same connection.
What you end up with is 2 connections held open to the server, with packets (json in this case), and some http headers from time to time.
To make sure the server would scale, I wrote a custom webserver in java using nio. It handles all of the connections in a single thread and as I say, scales to tens of thousands of connections.
If the client requests a new connection, it sends a request to the webserver, which then connects out, and starts proxying etc. It also runs an ident server in the case of irc connections so that an irc server can identify individual browsers. I looked at existing frameworks etc to do this sort of thing, but I valued learning how it all works, and thought that my use case may be specific enough to be able to optimise more than general frameworks can.

Resources