websocket scalability - performance

I have looked around a bit on websockets, and I have a pretty concrete question:
Can websockets actually be scaled over different servers, or are they always limited to one single server?
It seems that this is an issue I've repeatedly bumped into in the docs I have found, but maybe they were incomplete or things evolved. It seems for example as heroku even doesn't support websockets at all(?)

It depends on your application, but in general, there is no reason you can't load balance websocket connections to multiple machines in the same way as any other TCP connection.

Related

Scaling phoenix on heroku

I dont have a tonne of experience with heroku, and even less with phoenix, so this may be a stupid question... but want to make sure I am making a good choice on hosting :)
From what I understand, the way you scale phoenix is add another server, launch another node, and connect them, then let BEAM / OTP work its magic to handle work load balancing. On heroku, dynos can't really talk together over a local network, which from what I understand is something that BEAM requires to cluster. So adding dynos will result in a more "traditional" scaling model, where you have an external load balancer balancing connections between unconnected nodes, with the db being shared state.
My question here is how big of an impact will this have? Is it more only an issue when you are hitting serious levels of load / scale, or will it mean spending a lot more money on infrastructure then is necessary?
You'll get the best performance on a host that supports clustering, but Phoenix has a PubSub adapter system exactly for deployments like heroku:
https://github.com/phoenixframework/phoenix_pubsub
One line config change and mix.exs deps entry and you'll have multinode channels on heroku via our Redis adapter.
This is very open question, so I am sure my answer won't be comprehensive.
In your situation the most important question is: will I Phoenix use channels?
If you use plain old HTTP, it can be mostly stateless. There are lots of methods to simulate stateful connection like storing sessions in cookies. At the end of the day, it doesn't matter if your backend servers are connected with each other, because each of them is doing independent computations. Your load balancer can randomly select any server and it will always work. This cool feature of http enables this protocol to scale so well. You can definitely use Heroku in that scenario and it will work great.
If you use Phoenix channels, things get complicated. You still want to be able to connect to any of the servers, but you will probably send messages to other users real time and they can be connected to other servers. Phoenix solves this problem for you by clustering using BEAM and this will be hard on Heroku. Or even impossible.
To sum up: it is not a question of small scale/big scale. It is a question of features. Scaling channels will require clustering, scaling plain old HTTP will not.

Should we prefer SSE + REST over websocket when using HTTP/2?

When using websocket, we need a dedicated connection for bidirectionnel communication. If we use http/2 we have a second connection maintained by the server.
In that case, using websocket seems to introduce an unecessary overhead because with SSE and regular http request we can have the advantage of bidirectionnal communication over a single HTTP/2 connection.
What do you think?
Using 2 streams in one multiplexed HTTP/2 TCP connection (one stream for server-to-client communication - Server Sent Events (SSE), and one stream for client-to-server communication and normal HTTP communication) versus using 2 TCP connections (one for normal HTTP communication and one for WebSocket) is not easy to compare.
Probably the mileage will vary depending on applications.
Overhead ? Well, certainly the number of connections doubles up.
However, WebSocket can compress messages, while SSE cannot.
Flexibility ? If the connections are separated, they can use different encryptions. HTTP/2 typically requires very strong encryption, which may limit performance.
On the other hand, WebSocket does not require TLS.
Does clear-text WebSocket work in mobile networks ? In the experience I have, it depends. Antiviruses, application firewalls, mobile operators may limit WebSocket traffic, or make it less reliable, depending on the country you operate.
API availability ? WebSocket is a wider deployed and recognized standard; for example in Java there is an official API (javax.websocket) and another is coming up (java.net.websocket).
I think SSE is a technically inferior solution for bidirectional web communication and as a technology it did not become very popular (no standard APIs, no books, etc - in comparison with WebSocket).
I would not be surprised if it gets dropped from HTML5, and I would not miss it, despite being one of the first to implement it in Jetty.
Depending on what you are interested in, you have to do your benchmarks or evaluate the technology for your particular case.
From the perspective of a web developer, the difference between Websockets and a REST interface is semantics. REST uses a request/response model where every message from the server is the response to a message from the client. WebSockets, on the other hand, allow both the server and the client to push messages at any time without any relation to a previous request.
Which technique to use depends on what makes more sense in the context of your application. Sure, you can use some tricks to simulate the behavior of one technology with the other, but it is usually preferably to use the one which fits your communication model better when used by-the-book.
Server-sent events are a rather new technology which isn't yet supported by all major browsers, so it is not yet an option for a serious web application.
It depends a lot on what kind of application you want to implement. WebSocket is more suitable if you really need a bidirectional communication between server and client, but you will have to implement all the communication protocol and it might not be well supported by all IT infrastructures (some firewall, proxy or load balancers may not support WebSockets). So if you do not need a 100% bidirectional link, I would advise to use SSE with REST requests for additional information from client to server.
But on the other hand, SSE comes with certain caveats, like for instance in Javascript implementation, you can not overwrite headers. The only solution is to pass query parameters, but then you can face an issue with the query string size limit.
So, again, choosing between SSE and WebSockets really depends on the kind of application you need to implement.
A few months ago, I had written a blog post that may give you some information: http://streamdata.io/blog/push-sse-vs-websockets/. Although at that time we didn't consider HTTP2, this can help know what question you need to ask yourself.

SPA: using websockets only. Why not?

I am redesigning a web application which previously has been rendered server side to a Single Page Application and started to read about websockets . The web application will be using sockets to have new records and/or messages pushed to the client. I have been wondering why most pages which make use of sockets don't handle all their communication over the socket. Most of the times there is RESTful backend in addition to the websocket. Would it be a bad idea to have the client query for new resources over the socket? If so why - other than that a RESTful api might be easier to use with other devices?
I can imagine that using websockets would probably not be the best idea in case the network connection is kind of bad like on mobile devices, but that probably should work quite well with a reasonable connection to the web.
I found this related question, however it is from 2011 and seems a little outdated:
websocket api to replace rest api?
No, it won´t be a bad idea. Actually I work in an application that uses a WebSocket connection for all what is data interaction, the web server only handles requests for resources, views under different languages, dimensions .. etc..
The problem may be the lack of frameworks/tools based on a persistent connection. For many years most of frameworks, front and back end, have been designed and built around the request/response model. The approach shift may be no so easy to accept.
Coming back to this question a few years later, I would like to point out a few aspects to illustrate that having all your communication through websockets does have its drawbacks:
there is no common support for compression. You can easily configure your webserver to compress http requests and browsers have been known to happily accept compressed responses for years, however for web sockets it is still not that easy (even though the situation has improved)
client frameworks often are build upon commonly used standards like rest. The further away you move from frameworks expectations, the less addons or features will be available.
caching in the browser is not as easy. By now this goes a long way, reaching into the realm of offline availability and PWAs.
when using technology, that is only used by a subset of users, it is more likely to find new bugs, or bugs might take longer to fix. And if it's not bugs, there might be an edge case somewhere around the corner. This isn't an issue per se - but something to be aware of. If one runs into those things, they often easily take up quite some time to fix or work around.

Engine.io or SockJS, which one to choose?

I have run into trouble with Socket.io regarding memory leaks and scaling issues lately. My decision to use Socket.io was made over a year ago when it was undoubtedly the best library to use.
Now that Socket.io causes much trouble, I spent time looking for alternatives that became available in the meantime and think that both Engine.io and SockJS are generally well suited for me. However, in my opinion both have some disadvantages and I am not sure which one to choose.
Engine.io is basically the perfect lightweight version of Socket.io that does not contain all the features I do not require anyway. I have already written my own reconnection and heartbeat logic for Socket.io, because I was not satisfied with the default logics and I never intended to use rooms or other features that Socket.io offers.
But - in my opinion - the major disadvantage of Engine.io is the way connections are established. Clients start with slower jsonp-polling and are upgraded if they support better transports. The fact that the clients which support websockets natively (number increasing steadily) have a disadvantage in the form of a longer and unstable connection procedure over those clients which use outdated browsers, contradicts my sense of how it should be handled.
SockJS on the other hand handles the connections exactly as I would like to. From what I have read it seems to be pretty stable while Engine.io has some issues at this time.
My app is running behind an Nginx router on a single domain, therefore I do not need the cross-domain functionality SockJS offers. Because of providing this functionality, however, SockJS does not expose the cookie data of the client at all. So far I had a 2-factor authorization with Socket.io via cookie AND query string token and this would not be possible with SockJS (with Engine.io it would).
I have read pretty much all what is avilable about and pros and cons of both, but it seems there is not much being discussed or published so far, espacially about Engine.io (there are only 8 questions tagged with engine.io here).
Which of the 2 libraries do you prefer and for which reason? Do you use them in production?
Which one will likely be maintained more actively and could have a major advantage over the other in the future?
Have you looked at Primus? It offers the cookie requirements you mention, it supports all of the major 'real-time'/websocket libraries available and is a pretty active project. To me it also sounds like vendor lock-in could be a concern for you and Primus would address that.
The fact that it uses a plugin system should also a) make it easier for you to extend if needed and b) may actually have a community plugin that already does what you need.
Which of the 2 libraries do you prefer and for which reason? Do you use them in production?
I have only used SockJS via the Vert.x API and it was for an internal project that I would consider 'production', but not a production facing consumer app. That said, it performed very well.
Which one will likely be maintained more actively and could have a major advantage over the other in the future?
Just looking over the commit history of Engine.io and SockJS, and the fact that Auttomatic is supporting Engine.io makes me inclined to think that it will be more stable, for a longer period of time, but of course that's debatable. Looking at the issues for Engine.io and SockJS is another good place to evaluate, but since they're both split over multiple repos it should be taken with a grain of salt. I'm not sure where/how Automattic is using Engine/Socket.io, but if it's in WordPress.com or one of their plugins, it has substantial production-at-scale battle testing.
edit: change answer to reflect cookie support confirmed by Primus author in comments below
I'd like to redirect you to this (quite detailed) discussion thread about SockJS and Engine.io
https://groups.google.com/forum/#!topic/sockjs/WSIdcY14ciI
Basically,
SockJS detects working transports before marking the connection
as open. Engine.io will immediately open the connection and upgrade
it later.
flash, one of the Engine.io fallbacks
(and not present in SockJS) loads slowly and in environments
behind proxies takes 3 seconds to timeout.
SockJS doesn't use flash and therefore doesn't need to work around
this issue.
SockJS does the upgrade on start. After that you have
a consistent experience. You send what you send, you receive
what you receive.
Also, as far as I can tell, engine.io-client (the client-side) library for engine.io, does not support requirejs builds, so that's another negative point. (SockJS does build perfectly).
You may also consider node-walve. Complete WebSocket basic. Extremely performant as fully stream based.
Example of how to use:
walve.createServer(function(wsocket) {
wsocket.on('incoming', function(incoming) {
incoming.pipe(process.stdout, { end: false });
});
}).listen(server);
It may not be the best choice if you feel not secure in the nodejs environment (e.g. extending prototypes for API sugar), contributing to the project (though the code is more readable as socket.io).

WebSocket cross-connection communication (Tornado?)

I'm fumbling around a bit with WebSockets, and was pretty pleased with how easy it was to get a Tornado server running that does basic websocket connections. I've never used Tornado before today, and while I like what I've seen there's a few questions that I have regarding it's use.
Primarily, I'm using WebSockets so that I can have low-overhead communications between two or more client machines. (For the purposes of conversation let's just say it's a chat client) Obviously I can connect into the server from multiple machines, and they can all push messages to the server and the server can respond, which is great! But that's not too much better than your standard AJAX requests. If I have a persistent connection I want to be able to push data to the clients as well. The simplest possible scenario is user 1 posts a message to the server and upon receiving it the server immediately pushes it to user 2.
So what would be a good way to accomplish that? As far as I can see in Tornado there's no way to communicate between connections other than placing the message in a datastore somewhere and having all the other connections poll for new info. That strikes me as terribly clunky though, because all you're really doing at that point is moving the polling process from the client to the server.
Of course, I may be barking up the wrong tree entirely here. It's certainly plausible that Tornado simply isn't the right tool for this job, and if that's the case I'd be happy to hear suggestions for alternatives!
Here is a chat server using tornado, WebSockets and redis: https://gist.github.com/pelletier/532067 (Updated: link fixed, thanks #SamidhT)
Though the answer has already been accepted: Using a different service still seems very inefficient to me. Why don't you just go with shared memory + conditional variables / semaphores? You sound like you got a standard Consumer-Producer problem

Resources