At what point are WebSockets less efficient than Polling? - performance

While I understand that the answer to the above question is somewhat determined by your application's architecture, I'm interested mostly in very simple scenarios.
Essentially, if my app is pinging every 5 seconds for changes, or every minute, around when will the data being sent to maintain the open Web Sockets connection end up being more than the amount you would waste by simple polling?
Basically, I'm interested in if there's a way of quantifying how much inefficiency you incur by using frameworks like Meteor if an application doesn't necessarily need real-time updates, but only periodic checks.
Note that my focus here is on bandwidth utilization, not necessarily database access times, since frameworks like Meteor have highly optimized methods of requesting only updates to the database.

The whole point of a websocket connection is that you don't ever have to ping the app for changes. Instead, the client just connects once and then the server can just directly send the client changes whenever they are available. The client never has to ask. The server just sends data when it's available.
For any type of server-initiated-data, this is way more efficient with bandwidth than http polling. Besides giving you much more timely results (the results are delivered immediately rather than discovered by the client only on the next polling interval).
For pure bandwidth usage, the details would depend upon the exact circumstances. An http polling request has to set up a TCP connection and confirm that connection (even more data if its an SSL connection), then it has to send the http request, including any relevant cookies that belong to that host and including relevant headers and the GET URL. Then, the server has to send a response. And, most of the time all of this overhead of polling will be completely wasted bandwidth because there's nothing new to report.
A webSocket starts with a simple http request, then upgrades the protocol to the webSocket protocol. The webSocket connection itself need not send any data at all until the server has something to send to the client in which case the server just sends the packet. Sending the data itself has far less overhead too. There are no cookies, no headers, etc... just the data. Even if you use some keep-alives on the webSocket, that amount of data is incredibly tiny compared to the overhead of an HTTP request.
So, how exactly much you would save in bandwidth depends upon the details of the circumstances. If it takes 50 polling requests before it finds any useful data, then every one of those http requests is entirely wasted compared to the webSocket scenario. The difference in bandwidth could be enormous.
You asked about an application that only needs periodic checks. As soon as you have a periodic check that results in no data being retrieved, that's wasted bandwidth. That's the whole idea of a webSocket. You consume no bandwidth (or close to no bandwidth) when there's no data to send.

I believe #jfriend00 answered the question very clearly. However, I do want to add a thought.
By throwing in a worst case (and improbable) scenario for Websockets vs. HTTP, you would clearly see that a Websocket connection will always have an advantage in regards to bandwidth (and probably all-round performance).
This is the worst case scenario for Websockets v/s HTTP:
your code uses Websocket connections the exact same way it uses HTTP requests, for polling.
(which isn't something you would do, I know, but it is a worst case scenario).
Every polling event is answered positively - meaning that no HTTP requests were performed in vain.
This is the worst possible situation for Websockets, which are designed for pushing data rather than polling... even in this situation Websockets will save you both bandwidth and CPU cycles.
Seriously, even ignoring the DNS query (performed by the client, so you might not care about it) and the TCP/IP handshake (which is expensive for both the client and the server), a Websocket connection is still more performant and cost-effective.
I'll explain:
Each HTTP request includes a lot of data, such as cookies and other headers. In many cases, each HTTP request is also subject to client authentication... rarely is data given away to anybody.
This means that HTTP connections pass all this data (and possibly perform client authentication) once per request.[Stateless]
However, Websocket connections are stateful. The data is sent only once (instead of each time a request is made). Client authentication occurs only during the Websocket connection negotiation.
This means that Websocket connections pass the same data (and possibly perform client authentication) once per connection (once for all polls).
So even in this worst case scenario, where polling is always positive and Websockets are used for polling instead of pushing data, Websockets will still save your server both bandwidth and other resources (i.e. CPU time).
I think the answer to your question, simply put, is "never". Websockets are never less efficient than polling.

Related

When does server-side push technology become necessary?

Right now, I'm pulling some 10 kB sensor data from a server via a single plain old HTTP request every 5 minutes. In the future, I might want to increase the frequency to make one request every 30 seconds.
When does server-side push technology become necessary?
Obviously, the precise answer depends on the server - but what's the general approach to the issue? Using push technology definitely seems advantageous. However, the would have to be some major code rewriting. Additionally, I feel like 30 seconds is still a long enough interval and the overhead (e.g. cookies in HTTP headers, ...) shouldn't cause too much surplus traffic.
Push technology is useful for any of the following situations:
You need the client to have low latency when there is new data (e.g. not wait for the next polling interval, but to find out within seconds or even ms when there is new data).
When you're interested in minimizing overhead on your server or bandwidth usage or power consumption on the client, yet the client needs to know in a fairly timely fashion when new data is available. Doing frequent polling from a mobile client chews up bandwidth and battery.
When data availability is unpredictable and a regular polling would usually result in no data being available. If every poll results in data being collected and the timeliness of a moderate polling interval is sufficient for your application, then there are not big gains by switching to a push notification mechanism. When the poll usually results in an empty request, that's when polling becomes very inefficient.
If you're sending data regularly and you are trying to minimize bandwidth. A single webSocket packet is way more efficient than an HTTP request as the HTTP request includes headers, cookies, etc... that need not be send with a single webSocket packet once the webSocket connection has already been established.
Some other references on the topic:
What are the pitfalls of using Websockets in place of RESTful HTTP?
Ajax vs Socket.io
websocket vs rest API for real time data?
Cordova: Sockets, PushNotifications, or repeatedly polling server?
HTML5 WebSocket: A Quantum Leap in Scalability for the Web

What do you get with HTML5 web sockets that you can't have with AJAX?

Ian Hickson says:
I expect the iframe sandboxing feature
will be a big boon to developers if it
takes off. My own personal favorite
feature is probably the Web Sockets
API, which allows two-way
communication with a server so that
you can implement games, chatting,
remote controls, and so forth.
What can you get with web sockets that you can't get with AJAX? Is it just convenience, or is it somehow more efficient? Is it that the server can send data to the client, without having to wait for a message so it can respond?
Yes, it's all about the server being able to push data to the client. Currently, simulating bi-directional communication without Flash/Silverlight/Java/ActiveX takes the form of one of two workarounds:
Traditional polling: Clients make small requests to the server frequently, checking for updates. Even if no update has occurred, the client doesn't know that and must continuously poll for updates. Though each request may be lightweight, constant polling by many clients can add up quickly.
Long polling: Clients make periodic requests for updates, like regular polling, but if there are no updates yet available then the server does not respond immediately and holds the connection open. When an update is finally available, the server pushes that down to the client, which acts on it and then repeats that process. Long polling offers push-like update resolution, but is basically a self-inflicted DDoS attack and can be very resource intensive for many types of web servers.
With WebSockets, you get all of the responsiveness advantages of long polling, with dramatically less server-side overhead.
WebSockets are more efficient (and "more real-time") than AJAX calls because you keep connection open and don't send extra protocol headers and other stuff after each request and response. Look at this article:
During making connection with
WebSocket, client and server exchange
data per frame which is 2 bytes each,
compared to 8 kilo bytes of http
header when you do continuous polling.

What is the disadvantage of using websocket/socket.io where ajax will do?

Similar questions have been asked before and they all reached the conclusion that AJAX will not become obsolete. But in what ways is ajax better than websockets?
With socket.io, it's easy to fall back to flash or long polling, so browser compatibility seems to be a non-issue.
Websockets are bidirectional. Where ajax would make an asynchronous request, websocket client would send a message to the server. The POST/GET parameters can be encoded in JSON.
So what is wrong with using 100% websockets? If every visitor maintains a persistent websocket connection to the server, would that be more wasteful than making a few ajax requests throughout the visit session?
I think it would be more wasteful. For every connected client you need some sort of object/function/code/whatever on the server paired up with that one client. A socket handler, or a file descriptor, or however your server is setup to handle the connections.
With AJAX you don't need a 1:1 mapping of server side resource to client. Your # of clients can scale less dependently than your server-side resources. Even node.js has its limitations to how many connections it can handle and keep open.
The other thing to consider is that certain AJAX responses can be cached too. As you scale up you can add an HTTP cache to help reduce the load from frequent AJAX requests.
Short Answer
Keeping a websocket active has a cost, for both the client and the server, whether Ajax will have a cost only once, depending on what you're doing with it.
Long Answer
Websockets are often misunderstood because of this whole "Hey, use Ajax, that will do !". No, Websockets are not a replacement for Ajax. They can potentially be applied to the same fields, but there are cases where using Websocket is absurd.
Let's take a simple example : A dynamic page which loads data after the page is loaded on the client side. It's simple, make an Ajax call. We only need one direction, from the server to the client. The client will ask for these data, the server will send them to the client, done. Why would you implement websockets for such a task ? You don't need your connection to be opened all the time, you don't need the client to constantly ask the server, you don't need the server to notify the client. The connection will stay open, it will waste resources, because to keep a connection open you need to constantly check it.
Now for a chat application things are totally different. You need your client to be notified by the server instead of forcing the client to ask the server every x seconds or milliseconds if something is new. It would make no sense.
To understand better, see that as two persons. One of the two is the server, the over is the client. Ajax is like sending a letter. The client sends a letter, the server responds with another letter. The fact is that, for a chat application the conversation would be like that :
"Hey Server, got something for me ?
- No.
- Hey Server, got something for me ?
- No.
- Hey Server, got something for me ?
- Yes, here it is."
The server can't actually send a letter to the client, if the client never asked for an answer. It's a huge waste of resources. Because for every Ajax request, even if it's cached, you need to make an operation on the server side.
Now the case I discussed earlier with the data loaded with Ajax. Imagine the client is on the phone with the server. Keeping the connection active has a cost. It costs electricity and you have to pay your operator. Now why would you need to call someone and keep him on phone for an hour, if you just want that person to tell you 3 words ? Send a goddamn letter.
In conclusion Websockets are not a total replacement for Ajax !
Sometimes you will need Ajax where Websocket usage is absurd.
Edit : The SSE case
That technology isn't used very widely but it can be useful. As its name states it, Server-Sent Events are a one-way push from the server to the client. The client doesn't request anything, the server just sends the data.
In short :
- Unidirectional from the client : Ajax
- Unidirectional from the server : SSE
- Bidirectional : Websockets
Personally, I think that websockets will be used more and more in web applications instead of AJAX. They are not well suited to web sites where caching and SEO are of greater concern, but they will do wonders for webapps.
Projects such as DNode and socketstream help to remove the complexity and enable simple RPC-style coding. This means your client code just calls a function on the server, passing whatever data to that function it wants. And the server can call a function on the client and pass it data as well. You don't need to concern yourself with the nitty gritties of TCP.
Furthermore, there is a lot of overhead with AJAX calls. For instance, a connection needs to be established and HTTP headers (cookies, etc.) are passed with every request. Websockets eliminate much of that. Some say that websockets are more wasteful, and perhaps they are right. But I'm not convinced that the difference is really that substantial.
I answered another related question in detail, including many links to related resources. You might check it out:
websocket api to replace rest api?
I think that sooner or later websocket based frameworks will start to popup not just for writing real-time chat like parts of web apps, but also as standalone web frameworks. Once permanent connection is created it can be used for receiving all kinds of stuff including UI parts of web application which are now served for example through AJAX requests. This approach may hurt SEO in some way although it can reduce amount of traffic and load generated by asynchronous requests which includes redundant HTTP headers.
However I doubt that websockets will replace or endanger AJAX because there are numerous scenarios where permanent connections are unnecessary or unwanted. For example mashup applications which are using (one time) single purpose REST based services that doesn't need to be permanently connected with clients.
There's nothing "wrong" about it.
The only difference is mostly readability. The main advantage of Ajax is that it allows you fast development because most of the functionality is written for you.
There's a great advantage in not having to re-invent the wheel every time you want to open a socket.
WS:// connections have far less overhead than "AJAX" requests.
As other people said, keeping the connection open can be overkill in some scenarios where you don't need server to client notifications, or client to server request happens with low frecuency.
But another disadvantage is that websockets is a low level protocol, not offering additional features to TCP once the initial handshake is performed. So when implementing a request-response paradigm over websockets, you will probably miss features that HTTP (a very mature and extense protocol family) offers, like caching (client and shared caches), validation (conditional requests), safety and idempotence (with implications on how the agent behaves), range requests, content types, status codes, ...
That is, you reduce message sizes at a cost.
So my choice is AJAX for request-response, websockets for server pushing and high frequency low latency messaging
If you want the connection to server open and if continuous polling to the server will be there then go for sockets else you are good to go with ajax.
Simple Analogy :
Ajax asks questions(requests) to server and server gives answers(responses) to these questions. Now if you want to ask continuous questions then ajax wont work, it has a large overhead which will require resources at both the ends.

Web sockets make ajax/CORS obsolete?

Will web sockets when used in all web browsers make ajax obsolete?
Cause if I could use web sockets to fetch data and update data in realtime, why would I need ajax? Even if I use ajax to just fetch data once when the application started I still might want to see if this data has changed after a while.
And will web sockets be possible in cross-domains or only to the same origin?
WebSockets will not make AJAX entirely obsolete and WebSockets can do cross-domain.
AJAX
AJAX mechanisms can be used with plain web servers. At its most basic level, AJAX is just a way for a web page to make an HTTP request. WebSockets is a much lower level protocol and requires a WebSockets server (either built into the webserver, standalone, or proxied from the webserver to a standalone server).
With WebSockets, the framing and payload is determined by the application. You could send HTML/XML/JSON back and forth between client and server, but you aren't forced to. AJAX is HTTP. WebSockets has a HTTP friendly handshake, but WebSockets is not HTTP. WebSockets is a bi-directional protocol that is closer to raw sockets (intentionally so) than it is to HTTP. The WebSockets payload data is UTF-8 encoded in the current version of the standard but this is likely to be changed/extended in future versions.
So there will probably always be a place for AJAX type requests even in a world where all clients support WebSockets natively. WebSockets is trying to solve situations where AJAX is not capable or marginally capable (because WebSockets its bi-directional and much lower overhead). But WebSockets does not replace everything AJAX is used for.
Cross-Domain
Yes, WebSockets supports cross-domain. The initial handshake to setup the connection communicates origin policy information. The wikipedia page shows an example of a typical handshake: http://en.wikipedia.org/wiki/WebSockets
I'll try to break this down into questions:
Will web sockets when used in all web browsers make ajax obsolete?
Absolutely not. WebSockets are raw socket connections to the server. This comes with it's own security concerns. AJAX calls are simply async. HTTP requests that can follow the same validation procedures as the rest of the pages.
Cause if I could use web sockets to fetch data and update data in realtime, why would I need ajax?
You would use AJAX for simpler more manageable tasks. Not everyone wants to have the overhead of securing a socket connection to simply allow async requests. That can be handled simply enough.
Even if I use ajax to just fetch data once when the application started I still might want to see if this data has changed after a while.
Sure, if that data is changing. You may not have the data changing or constantly refreshing. Again, this is code overhead that you have to account for.
And will web sockets be possible in cross-domains or only to the same origin?
You can have cross domain WebSockets but you have to code your WS server to accept them. You have access to the domain (host) header which you can then use to accept / deny requests. This can, however, be spoofed by something as simple as nc. In order to truly secure the connection you will need to authenticate the connection by other means.
Websockets have a couple of big downsides in terms of scalability that ajax avoids. Since ajax sends a request/response and closes the connection (..or shortly after) if someone stays on the web page it doesn't use server resources when idling. Websockets are meant to stream data back to the browser, and they tie up server resources to do so. Servers have a limit in how many simultaneous connections they can keep open at one time. Not to mention depending on your server side technology, they may tie up a thread to handle the socket. So websockets have more resource intensive requirements for both sides per connection. You could easily exhaust all of your threads servicing clients and then no new clients could come in if lots of users are just sitting on the page. This is where nodejs, vertx, netty can really help out, but even those have upper limits as well.
Also there is the issue of state of the underlying socket, and writing the code on both sides that carry on the stateful conversation which isn't something you have to do with ajax style because it's stateless. Websockets require you create a low level protocol which is solved for you with ajax. Things like heart beating, closing idle connections, reconnection on errors, etc are vitally important now. These are things you didn't have to solve when using AJAX because it was stateless. State is very important to the stability of your app and more importantly the health of your server. It's not trivial. Pre-HTTP we built a lot of stateful TCP protocols (FTP, telnet, SSH), and then HTTP happened. And no one did that stuff much anymore because even with its limitations HTTP was surprisingly easier and more robust. Websockets bring back the good and the bad of stateful protocols. You'll learn soon enough if you didn't get a dose of that last go around.
If you need streaming of realtime data this extra overhead is warranted because polling the server to get streamed data is worse, but if all you are doing is user interaction->request->response->update UI, then ajax is easier and will use less resources because once the response is sent the conversation is over and no additional server resources are used. So I think it's a tradeoff and the architect has to decide which tool fits their problem. AJAX has its place, and websockets have their place.
Update
So the architecture of your server is what matters when we are talking about threads. If you are using a traditionally multi-threaded server (or processes) where a each socket connection gets its own thread to respond to requests then websockets matter a lot to you. So for each connection we have a socket, and eventually the OS will fall over if you have too many of these, and the same goes for threads (more so for processes). Threads are heavier than sockets (in terms of resources) so we try and conserve how many threads we have running simultaneously. That means creating a thread pool which is just a fixed number of threads that is shared among all sockets. But once a socket is opened the thread is used for the entire conversation. The length of those conversations govern how quickly you can repurpose those threads for new sockets coming in. The length of your conversation governs how much you can scale. However if you are streaming this model doesn't work well for scaling. You have to break the thread/socket design.
HTTP's request/response model makes it very efficient in turning over threads for new sockets. If you are just going to use request/response use HTTP its already built and much easier than reimplementing something like that in websockets.
Since websockets don't have to be request/response as HTTP and can stream data if your server has a fixed number of threads in its thread pool and you have the same number of websockets tying up all of your threads with active conversations, you can't service new clients coming in! You've reached your maximum capacity. That's where protocol design is important too with websockets and threads. Your protocol might allow you to loosen the thread per socket per conversation model that way people just sitting there don't use a thread on your server.
That's where asynchronous single thread servers come in. In Java we often call this NIO for non-blocking IO. That means it's a different API for sockets where sending and receiving data doesn't block the thread performing the call.
So traditional in blocking sockets when you call socket.read() or socket.write() they wait until the data is received or sent before returning control to your program. That means your program is stuck waiting for the socket data to come in or go out until you can do anything else. That's why we have threads so we can do work concurrently (at the same time). Send this data to client X while I wait on data from client Y. Concurrencies is the name of the game when we talk about servers.
In a NIO server we use a single thread to handle all clients and register callbacks to be notified when data arrives. For example
socket.read( function( data ) {
// data is here! Now you can process it very quickly without waiting!
});
The socket.read() call will return immediately without reading any data, but our function we provided will be called when it comes in. This design radically changes how you build and architect your code because if you get hung up waiting on something you can't receive any new clients. You have a single thread you can't really do two things at once! You have to keep that one thread moving.
NIO, Asynchronous IO, Event based program as this is all known as, is a much more complicated system design, and I wouldn't suggest you try and write this if you are starting out. Even very Senior programmers find it very hard to build a robust systems. Since you are asynchronous you can't call APIs that block. Like reading data from the DB or sending messages to other servers have to be performed asynchronously. Even reading/writing from the file system can slow your single thread down lowering your scalability. Once you go asynchronous it's all asynchronous all the time if you want to keep the single thread moving. That's where it gets challenging because eventually you'll run into an API, like DBs, that is not asynchronous and you have to adopt more threads at some level. So a hybrid approaches are common even in the asynchronous world.
The good news is there are other solutions that use this lower level API already built that you can use. NodeJS, Vertx, Netty, Apache Mina, Play Framework, Twisted Python, Stackless Python, etc. There might be some obscure library for C++, but honestly I wouldn't bother. Server technology doesn't require the very fastest languages because it's IO bound more than CPU bound. If you are a die hard performance nut use Java. It has a huge community of code to pull from and it's speed is very close (and sometimes better) than C++. If you just hate it go with Node or Python.
Yes, yes it does. :D
The earlier answers lack imagination. I see no more reason to use AJAX if websockets are available to you.

Push or Pull for a near real time automation server?

We are currently developing a server whereby a client requests interest in changes to specific data elements and when that data changes the server pushes the data back to the client. There has vigorous debate at work about whether or not it would be better for the client to poll for this data.
What is considered to be the ideal method, in terms of performance, scalability and network load, of data transfer in a near real time environment?
Update:
Here's a Link that gives some food for thought with regards to UI updates.
There's probably no ideal method for every situation, but push is usually better and used more often. It allows to optimize server caching and data transfers, which helps performance and scalability, and cuts network traffic a bit by avoiding client requests and empty responses. It can be important advantage for a server to operate in it's own pace and supply clients with data when it is ready.
Industry standarts - such as OPC, GID - support both. Server pushes updates to subscribed clients, but client can pull some rarely used data out without bothering with subscription.
As long as the client initiates the connection (to get passed firewall and NAT problems) either way is fine.
If there are several different type of data you need to send, you might want to have the client specify which type he wants, but this is only needed once per connection. Then you can have the server continue to send updates as it has them.
It would be less network traffic to have the server send updates without the client continually asking for updates.
What do you have on the client's side? Many firewalls allow outgoing requests but block incoming requests. In other words, pull may be your only option if you are crossing the Internet unless you are sending out e-mails.

Resources