Which reverse proxies work for node.js socket.io apps? - performance

I have currently several apps running that are behind an Apache reverse proxy. I do this because I have one public IP address for multiple servers. I use VirtualHosts to proxy the right app to the right service. For example:
<VirtualHost *:80>
ServerAdmin webmaster#localhost
ServerName nagios.myoffice.com
ProxyPass / http://nagios.myoffice.com/
ProxyPassReverse / http://nagios.myoffice.com/
</VirtualHost>
This works fine for apps like PHP, Django and Rails, but I'd like to start experimenting with Node.js.
I've already noticed that apps that are behind the Apache proxy can't handle as high of a load as when I access them directly. Very likely because the Apache configuration is not ideal (not enough simultaneous connections maybe).
One of the coolest features I'd like to experiment with in node.js is the socket.io capabilities which I'm afraid will really expose the performance problem. Especially because, as I understand it, socket.io will keep one of my precious few Apache connections open constantly.
Can you suggest a reverse proxy server I can use in this situation that will let me use multiple virtualhosts and will not stifle the node.js apps performance too much or get in the way of socket.io experimentation?

I recommend node-http-proxy. Very active community and proven in production.
FEATURES
Reverse proxies incoming http.ServerRequest streams
Can be used as a CommonJS module in node.js
Uses event buffering to support application latency in proxied requests
Reverse or Forward Proxy based on simple JSON-based configuration
Supports WebSockets
Supports HTTPS
Minimal request overhead and latency
Full suite of functional tests
Battled-hardened through production usage # [nodejitsu.com][0]
Written entirely in Javascript
Easy to use API
Install using the following command
npm install http-proxy
Here is the Github page and the NPM page

Although this introduces a new technology, I'd recommend using nginx as a front-end. nginx is a fast and hardened server written in c that is quite good at reverse proxying. Like node, it is event-driven and asynchronous.
You can use nginx to forward requests to various nodejs servers you are running, either load-balancing, or depending on the url (since it can do things like rewrites).

Related

expensive aws load balancer, perhaps wrong setup

Some time ago, I needed HTTPS support for my express webserver. I found a tutorial that teached me a cool trick to achieve this. They basically explained me that an AWS load balancer can redirect HTTPS to HTTP.
So, I first created a load balancer.
And then redirected HTTPS to HTTP. The traditional HTTP, I just redirected 80 to 80. And I have a websocket (socket io) thing going on port 1337 (which I plan to change to port 1338 in the near future).
Just for clarity. I didn't really need a load balancer, since I actually only have 1 AWS instance. But using this setup, I did not have to go through the trouble of messing around with HTTPS certificate files, neither did I have to upgrade my webserver. It saved me a lot of trouble at first.
Then this morning, I received the bill, and discovered that this load balancing trick has a price tag of roughly 22usd/mo. (an expensive port forwarding trick)
I probably have to get rid of this load balancer. But I am wondering, perhaps I did something wrong in the configuration.
It's strange that charges are so high for a web app that is still in development. So, I am wondering if perhaps there is something wrong with my setup. And that leads me to the following question.
I noticed that I am actually using an old ELB setup: "Classic load balancer". And it actually states that this setup does not support websockets, which is a bit strange.
My web app hosts some static webpages (angular), but once it is downloaded, all traffic uses socket.io websockets. Even though the AWS documentation says that websockets are not supported, it seems to work fine. Unless ...
Now, socket io is a pretty smart thing. When it can't use modern websockets (e.g. because the webbrowser does not support it), it falls back to a kind of HTTP polling. I guess that means that from a load-balancer point of view, it creates 100s of visits per minute. And right now, I am wondering if that has an influence on the charges.
My really long question comes down to a simple one. Do you think upgrading my load balancer would decrease the number of counted "loadbalancer hours" ?
EDIT
Here are some ELB metrics. They are too complicated for me to draw conclusions. But perhaps some of you experts can. :)

What are issues with using WebSockets with proxies and load balancers?

I'm reading up on SockJS node server. Documentation says:
Often WebSockets don't play nicely with proxies and load balancers. Deploying a SockJS server behind Nginx or Apache could be painful. Fortunately recent versions of an excellent load balancer HAProxy are able to proxy WebSocket connections. We propose to put HAProxy as a front line load balancer and use it to split SockJS traffic from normal HTTP data.
I'm curious if anyone can expand on the problem that is being solved by HAProxy in this case? Specifically:
Why websockets don't play nice with proxies and load balancers?
Why deploying Sockjs sever behind Apache is painful?
1. Why websockets don't play nice with proxies and load balancers?
I'd recommend you read this article on How HTML5 Web Sockets Interact With Proxy Servers by Peter Lubbers. It should cover everything you need to know about WebSocket and proxies - and thus, load balancers.
2. Why deploying Sockjs sever behind Apache is painful?
There is a module for handling WebSocket connections but at present Apache doesn't natively support WebSocket, nor does it look like it will any time soon based on this bug filed on apache - HTML5 Websocket implementation. The suggestion is that it actually fits the module pattern better.
So, it's "painful" simply because it's not easy - there's no official support and therefore it doesn't have the community use that it otherwise may have.
There may also be other pains in the SockJS has HTTP-based fallback transports. So you need to proxy both the WebSocket connections (using the apache-websocket module) and also HTTP requests when fallback is used.
Related to this: Nginx v1.3 was released in February with WebSocket support.

Is it better to use CORS or nginx proxy_pass for a RESTful client-server app?

I have a client-server app where the server is a Ruby on rails app that renders JSON and understands RESTful requests. It's served by nginx+passenger and it's address is api.whatever.com.
The client is an angular js application that consumes these services (whatever.com). It is served by a second nginx server and it's address is whatever.com.
I can either use CORS for cross subdomain ajax calls or configure the client' nginx to proxy_pass requests to the rails application.
Which one is better in terms of performance and less trouble for developers and server admins?
Unless you're Facebook, you are not going to notice any performance hit from having an extra reverse proxy. The overhead is tiny. It's basically parsing a bunch of bytes and then sending them over a local socket to another process. A reverse proxy in Nginx is easy enough to setup, it's unlikely to be an administrative burden.
You should worry more about browser support. CORS is supported on almost every browser, except of course for Internet Explorer and some mobile browsers.
Juvia uses CORS but falls back to JSONP. No reverse proxy setup.

How to build local web proxy without configuring the browsers

How does Netnanny or k9 Web Protection setup web proxy without configuring the browsers?
How can it be done?
Using WinSock directly, or at the NDIS or hardware driver level, and
then filter at those levels, just like any firewalls soft does. NDIS being the easy way.
Download this ISO image: http://www.microsoft.com/downloads/en/confirmation.aspx?displaylang=en&FamilyID=36a2630f-5d56-43b5-b996-7633f2ec14ff
it has bunch of samples and tools to help you build what you want.
After you mount or burn it on CD and install it go to this folder:
c:\WinDDK\7600.16385.1\src\network\ndis\
I think what you need is a transparent proxy that support WCCP.
Take a look at squid-cache FAQ page
And the Wikipedia entry for WCCP
With that setup you just need to do some firewall configuration and all your web traffic will be handled by the transparent proxy. And no setup will be needed on your browser.
netnanny is not a proxy. It is tied to the host machine and browser (and possibly other applications as well. It then filters all incoming and outgoing "content" from the machine/application.
Essentially Netnanny is a content-control system as against destination-control system (proxy).
Easiest way to divert all traffic to a certain site to some other address is by changing hosts file on local host
You might want to have a look at the explanation here: http://www.fiddlertool.com/fiddler/help/hookup.asp
This is how Fiddler2 achieves inserting a proxy in between most apps and the internet without modifying the apps (although lots of explanation of how-to failing the default setup). This does not answer how NetNanny/K9 etc work though, as noted above they do a little more and may be a little more intrusive.
I believe you search for BrowserHelperObjects. These little gizmos capture ALL browser communication, and as such can either remote ads from the HTML (good gizmo), or redirect every second click to a spam site (bad gizmo), or just capture every URL you type and send it home like all the WebToolBars do.
What you want to do is route all outgoing http(s) requests from your lan through a reverse proxy (like squid). This is the setup for a transparent web proxy.
There are different ways to do this, although I've only ever set it up OpenBSD and Linux; and using Squid as the reverse proxy.
At a high level you have a firewall with rules to send all externally bound http traffic to a local squid server. The Squid server is configured to:
accept all http requests
forward the requests on to the real external hosts
cache the reply
forward the reply back to the requestor on the local lan
You can then add more granular rules in Squid to control access to websites, filter content, etc.
I pretty sure you can also get this functionality in different networking gear. I bet F5 has some products that do some or all of what I described, and probably Cisco as well. There is probably other proxies out there besides Squid that you can use too.
PS. I have no idea if this is how K9 Web Protection or NetNanny works.
Squid could provide an intercept proxy for HTTP and HTTPs ports, without configuring the browsers and it also supports WCCP.

IIS cache with PURGE support

On Unix, I normally deploy nginx in front of Varnish in front of my application server. Both nginx and Varnish are acting as reverse proxies here. Varnish maintains a cache and supports things like If-Modified-Since, Cache-Control response headers and PURGE requests from the application. nginx is good at receiving a lot of connections. I also use it to serve some static content, enable gzip compression etc.
On Windows, I can manage with Squid in front of IIS. I'm planning to deploy my (Python) application as an ISAPI wildcard filter (using the isapi-wsgi package), so the application will live in a thread pool managed by IIS.
However, Squid development on Windows appears to have stalled, and I'd prefer to keep IIS on port 80, so that I can serve certain things directly from disk. I also suspect IIS is more resilient in handling lots of connections than Squid on Windows.
What do people normally use here? One option would be to use another free-standing caching proxy in front of IIS. Another option may be something installed as an ISAPI filter, which would intercept requests and respond to things like If-Modified-Since, requets for images and other cached resources, and PURGE requests from the application.
Does such a thing exist? Or are the only real choices Squid and MS ISA (too expensive).
Cheers,
Martin
IIS7 with Application Request Routing (see http://www.iis.net/download/ApplicationRequestRouting) supports full proxy caching on the same box or with the cache server in front of your middle tier.
Once ARR is installed, to enable proxy caching from the command line run the following:
%windir%\System32\inetsrv\appcmd.exe set config -section:system.webServer/diskCache /+"[path='C:\MyCacheFolder',maxUsage='0']" /commit:apphost
To vary caching based on query string, execute the following:
%windir%\System32\inetsrv\appcmd.exe set config -section:system.webServer/proxy /cache.queryStringHandling:"Accept" /commit:apphost
See the documentation link above for more details. Notice that static and dynamic content can have different caching strategies, etc. If you pursue using this, follow up with specific questions--it can be a bit of a trick lining everything up if you're looking for fine-grained control.

Resources