I'm reading up on SockJS node server. Documentation says:
Often WebSockets don't play nicely with proxies and load balancers. Deploying a SockJS server behind Nginx or Apache could be painful. Fortunately recent versions of an excellent load balancer HAProxy are able to proxy WebSocket connections. We propose to put HAProxy as a front line load balancer and use it to split SockJS traffic from normal HTTP data.
I'm curious if anyone can expand on the problem that is being solved by HAProxy in this case? Specifically:
Why websockets don't play nice with proxies and load balancers?
Why deploying Sockjs sever behind Apache is painful?
1. Why websockets don't play nice with proxies and load balancers?
I'd recommend you read this article on How HTML5 Web Sockets Interact With Proxy Servers by Peter Lubbers. It should cover everything you need to know about WebSocket and proxies - and thus, load balancers.
2. Why deploying Sockjs sever behind Apache is painful?
There is a module for handling WebSocket connections but at present Apache doesn't natively support WebSocket, nor does it look like it will any time soon based on this bug filed on apache - HTML5 Websocket implementation. The suggestion is that it actually fits the module pattern better.
So, it's "painful" simply because it's not easy - there's no official support and therefore it doesn't have the community use that it otherwise may have.
There may also be other pains in the SockJS has HTTP-based fallback transports. So you need to proxy both the WebSocket connections (using the apache-websocket module) and also HTTP requests when fallback is used.
Related to this: Nginx v1.3 was released in February with WebSocket support.
Related
Is there any CDN provider able to offer some sort of WebSocket caching either via "Edge Workers" or other mechanism?
I would like that some WebSocket requests that we make to be able to serve them as close to the user as possible without us setting up our entire infrastructure in multiple datacenters.
No, there is no way to cache WebSocket messages - e.g. the protocol does not have any "addressable" resources as e.g. HTTP. HTTP has more built in design for caching in the protocol, e.g. GET requests are typically cacheable but not POST requests and you can add cache related headers to the requests, e.g. how long it can be cached and so on.
For this reason, I would recommend to use HTTP instead of WebSocket unless you have a use case where you really have to use WebSockets. With HTTP you have access to all CDN and proxy infrastructure that is available globally.
It is getting more and more common to use e.g. Server Sent Events, HTTP/2 streams or possibly GRPC for use cases where WebSockets was used before. The upcoming WebTransport protocol will probably be the replacement(?) - at least when most other traffic is HTTP/3.
Seconded, caching WebSocket messages is not possible/not recommended. I would encourage you to look into transforming your WebSocket messaging into a form of Edge Computing.
For instance:
Akamai offers EdgeWorkers.
CloudFlare offers Workers.
AWS offers Lambda.
This might offer you the ability to handle your workflow instead of relying on WebSockets.
/Mike (and yes, I work at Akamai)
I have a process that runs in California that wants to talk to a process in New York, using Stomp over Websockets.
Also note that my process is not a web app, but I implemented a stomp over websocket client in C++, in order to connect things up to my backend. Maybe this was or wasn't a good idea. So, I want my client to talk to the server and subscribe, where their client pushed messages.
I was implementing my own server when I saw that ApacheMQ supported Stomp over Websockets. So, I started reading the docs.
It says with the last line under 'configuration' at
http://activemq.apache.org/websockets :
One thing worth noting is that web sockets (just as Ajax) implements ? > the same origin policy, so you can access only brokers running on the > same host as the web application running the client.
it says it again in several related searches such as http://sensatic.net/activemq/activemq-54-stomp-over-web-sockets.html
Is this a limitation of the server or the web client?
With that limitation, if I understand right, the server is not going to accept websocket connections from a client, of any kind, that is not on the same machine?
I am not sure I see the point of that...
If that is indeed its meaning, then how do I get around it in order to implement my scenario?
I've not found that bit of documentation you are referring to but from what I know of the STOMP implementation on the broker this seems incorrect. There shouldn't be any limit to the transport connector accepting connect requests from an outside host by default and I don't think the browser treats the websocket requests the same as it does other things like an Ajax case in terms of the same origin policy.
This probably a case that is best checked by actually trying it to see if it works, I've connected just fine from outside the same host using AMQP over websockets on ActiveMQ so I'd guess the STOMP stack should also work fine.
I'm trying to get a better understanding of how the server-side architecture works for WebSockets with the goal of implementing it in an embedded application. It seems that there are 3 different server-side software components in play here: 1) the web server to serve static HTTP pages and handle upgrade request, 2) a WebSockets library such as libwebsockets to handle the "nuts and bolts" of WebSockets communications, and 3) my custom application to actually figure out what to do with incoming data. How do all these fit together? Is it common to have a separate web server and WebSocket handling piece, aka a WebSocket server/daemon?
How does my application communicate with the web server and/or WebSockets library to send/receive data? For example, with CGI, the web server uses environmental variables to send info to the custom application, and stdout to receive responses. What is the equivalent communication system here? Or do you typically link in a WebSocket library into the customer application? But then how would communication with the web server to the WebSocket library + custom application work? Or all 3 combined into a single component?
Here's why I am asking. I'm using the boa web server on a uClinux/no MMU platform on a Blackfin processor with limited memory. There is no native WebSocket support in boa, only CGI. I'm trying to figure out how I can add WebSockets support to that. I would prefer to use a compiled solution as opposed to something interpreted such as JavaScript, Python or PHP. My current application using long polling over CGI, which does not provide adequate performance for planned enhancements.
First off, it's important to understand how a webSocket connection is established because that plays into an important relationship between webSocket connections and your web server.
Every webSocket connection starts with an HTTP request. The browser sends an HTTP request to the host/port that the webSocket connection is requested on. That request might look something like this:
GET /chat HTTP/1.1
Host: example.com:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
What distinguishes this request from any other HTTP request to that server is the Upgrade: websocket header in the request. This tells the HTTP server that this particular request is actually a request to initiate a webSocket connection. This header also allows the web server to tell the difference between a regular HTTP request and a request to open a webSocket connection. This allows something very important in the architecture and it was done this way entirely on purpose. This allows the exact same server and port to be used for both serving your web requests and for webSocket connections. All that is needed is a component on your web server that looks for this Upgrade header on all incoming HTTP connections and, if found, it takes over the connection and turns it into a webSocket connection.
Once the server recognizes this upgrade header, it responds with a legal HTTP response, but one that signals the client that the upgrade to the webSocket protocol has been accepted that looks like this:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
At that point, both client and server keep that socket from the original HTTP request open and both switch to the webSocket protocol.
Now, to your specific questions:
How does my application communicate with the web server and/or
WebSockets library to send/receive data?
Your application may use the built-in webSocket support in modern browsers and can initiate a webSocket connection like this:
var socket = new WebSocket("ws://www.example.com");
This will instruct the browser to initiate a webSocket connection to www.example.com use the same port that the current web page was connected with. Because of the built-in webSocket support in the browser, the above HTTP request and upgrade protocol is handled for you automatically from the client.
On the server-side of things, you need to make sure you are using a web server that has incoming webSocket support and that the support is enabled and configured. Because a webSocket connection is a continuous connection once established, it does not really follow the CGI model at all. There must be at least one long-running process handling live webSocket connections. In server models (like CGI), you would need some sort of webServer add-on that supports this long-running process for your webSocket connections. In a server environment like node.js which is already a long running process, the addition of webSockets is no change at all architecturally - but rather just an additional library to support the webSocket protocol.
I'd suggest you may find this article interesting as it discussions this transition from CGI-style single request handling to the continuous socket connections of webSocket:
Web Evolution: from CGI to Websockets (and how it will help you better monitor your cloud infrastructure)
If you really want to stick with the stdin/stdout model, there are libraries that model that for your for webSockets. Here's one such library. Their tagline is "It's like CGI, twenty years later, for WebSockets".
I'm trying to figure out how I can add WebSockets support to that. I
would prefer to use a compiled solution as opposed to something
interpreted such as JavaScript, Python or PHP.
Sorry, but I'm not familiar with that particular server environment. It will likely take some in-depth searching to find out what your options are. Since a webSocket connection is a continuous connection, then you will need a process that is running continuously that can be the server-side part of the webSocket connection. This can either be something built into your webServer or it can be an additional process that the webServer starts up and forwards incoming connections to.
FYI, I have a custom application at home here built on a Raspberry Pi that uses webSockets for real-time communication with browser web pages and it works just fine. I happen to be using node.js for the server environment and the socket.io library that runs on top of webSockets to give me a higher level interface on top of webSockets. My server code checks several hardware sensors on a regular interval and then whenever there is new/changed data to report, it sends messages down any open webSockets so the connected browsers get real-time updates on the sensor readings.
You would likely need some long-running application that incoming webSocket connections were passed from the web server to your long running process or you'd need to make the webSocket connections on a different port than your web server (so they could be fielded by a completely different server process) in which case you'd have a whole separate server to handle your webSocket requests and sockets (this server would also have to support CORS to enable browsers to connect to it since it would be a different port than your web pages).
I have currently several apps running that are behind an Apache reverse proxy. I do this because I have one public IP address for multiple servers. I use VirtualHosts to proxy the right app to the right service. For example:
<VirtualHost *:80>
ServerAdmin webmaster#localhost
ServerName nagios.myoffice.com
ProxyPass / http://nagios.myoffice.com/
ProxyPassReverse / http://nagios.myoffice.com/
</VirtualHost>
This works fine for apps like PHP, Django and Rails, but I'd like to start experimenting with Node.js.
I've already noticed that apps that are behind the Apache proxy can't handle as high of a load as when I access them directly. Very likely because the Apache configuration is not ideal (not enough simultaneous connections maybe).
One of the coolest features I'd like to experiment with in node.js is the socket.io capabilities which I'm afraid will really expose the performance problem. Especially because, as I understand it, socket.io will keep one of my precious few Apache connections open constantly.
Can you suggest a reverse proxy server I can use in this situation that will let me use multiple virtualhosts and will not stifle the node.js apps performance too much or get in the way of socket.io experimentation?
I recommend node-http-proxy. Very active community and proven in production.
FEATURES
Reverse proxies incoming http.ServerRequest streams
Can be used as a CommonJS module in node.js
Uses event buffering to support application latency in proxied requests
Reverse or Forward Proxy based on simple JSON-based configuration
Supports WebSockets
Supports HTTPS
Minimal request overhead and latency
Full suite of functional tests
Battled-hardened through production usage # [nodejitsu.com][0]
Written entirely in Javascript
Easy to use API
Install using the following command
npm install http-proxy
Here is the Github page and the NPM page
Although this introduces a new technology, I'd recommend using nginx as a front-end. nginx is a fast and hardened server written in c that is quite good at reverse proxying. Like node, it is event-driven and asynchronous.
You can use nginx to forward requests to various nodejs servers you are running, either load-balancing, or depending on the url (since it can do things like rewrites).
The WebSocket standard hasn't been ratified yet, however from the draft it appears that the technology is meant to be implemented in Web servers. pywebsocket implements a WebSocket server which can be dedicated or loaded as Apache plugin.
So what I am am wondering is: what's the ideal use of WebSockets? Does it make any sense to implement a service using as dedicated WebSocket servers or is it better to rethink it to run on top of WebSocket-enabled Web server?
The WebSocket protocol was designed with three models in mind:
A WebSocket server running completely separately from any web server.
A WebSocket server running separately from a web server, but with traffic proxied to the websocket server from the web server (allowing websocket and HTTP traffic to co-exist on the same port)
A WebSocket server running as a plugin in the web server.
The model you pick really depends on the application you are trying to build and some other constraints that may limit your choices.
For example, if your application is going to be served from a single web server and the WebSocket connection will always be back to that same server, then it probably makes sense to just run the WebSocket server as a plugin/module in the web server.
On the other hand if you have a general WebSocket service that is usable from many different web sites (for example, you could have continuous low-latency traffic updates served from a WebSocket server), then you probably want to run the WebSocket server separate from any web server.
Basically, the tighter the integration between your WebSocket service and your web service, the more likely you will want to run them together and on the same port.
There are some constraints that may force one model or another:
If you control the server(s) but not the incoming firewall rules, then you probably have no choice but to run the WebSocket server on the same port(s) as your HTTP/HTTPS server (e.g. 80 and 443). In which case you will have to use a web server plugin or proxy to the real WebSocket server.
On the other hand, if you do not have super-user permission on the server where you are running the WebSocket server, then you will probably not be able to use ports 80 and 443 (below 1024 is generally a privileged port range) and in that case it really doesn't matter whether you run the HTTP/S and WebSocket servers on the same port or not.
If you have cookie based authentication (such as OAuth) in the web server and you would like to re-use this for the WebSocket connections then you will probably want to run them together (special case of tight integration).