Push or Pull for a near real time automation server? - performance

We are currently developing a server whereby a client requests interest in changes to specific data elements and when that data changes the server pushes the data back to the client. There has vigorous debate at work about whether or not it would be better for the client to poll for this data.
What is considered to be the ideal method, in terms of performance, scalability and network load, of data transfer in a near real time environment?
Update:
Here's a Link that gives some food for thought with regards to UI updates.

There's probably no ideal method for every situation, but push is usually better and used more often. It allows to optimize server caching and data transfers, which helps performance and scalability, and cuts network traffic a bit by avoiding client requests and empty responses. It can be important advantage for a server to operate in it's own pace and supply clients with data when it is ready.
Industry standarts - such as OPC, GID - support both. Server pushes updates to subscribed clients, but client can pull some rarely used data out without bothering with subscription.

As long as the client initiates the connection (to get passed firewall and NAT problems) either way is fine.
If there are several different type of data you need to send, you might want to have the client specify which type he wants, but this is only needed once per connection. Then you can have the server continue to send updates as it has them.
It would be less network traffic to have the server send updates without the client continually asking for updates.

What do you have on the client's side? Many firewalls allow outgoing requests but block incoming requests. In other words, pull may be your only option if you are crossing the Internet unless you are sending out e-mails.

Related

Should a websocket connection be general or specific?

Should a websocket connection be general or specific?
e.g. If I was building a stock trading system, I'd likely to have real time stock prices, real time trade information, real time updates to the order book, perhaps real time chat to enable traders to collude and manipulate the market. Should I have one websocket to handle all the above data flow or is it better to have several websocket to handle different topics?
It all depends. Let's look at your options, assuming your stock trader, your chat, and your order book are built as separate servers/micro-services.
One WebSocket for each server
You can have each server running their own WebSocket server, streaming events relevant to that server.
Pros
It is a simple approach. Each server is independent.
Cons
Scales poorly. The number of open TCP connections will come at a price as the number of concurrent users increases. Increased complexity when you need to replicate the servers for redundancy, as all replicas needs to broadcast the same events. You also have to build your own fallback for recovering from client data going stale due to lost WebSocket connection. Need to create event handlers on the client for each type of event. Might have to add version handling to prevent data races if initial data is fetched over HTTP, while events are sent on the separate WebSocket connection.
Publish/Subscribe event streaming
There are many publish/subscribe solutions available, such as Pusher, PubNub or SocketCluster. The idea is often that your servers publish events on a topic/subject to a message queue, which is listened to by WebSocket servers that forwards the events to the connected clients.
Pros
Scales more easily. The server only needs to send one message, while you can add more WebSocket servers as the number of concurrent users increases.
Cons
You most likely still have to handle recovery from events lost during disconnect. Still might require versioning to handle data races. And still need to write handlers for each type of event.
Realtime API gateway
This part is more shameless, as it covers Resgate, an open source project I've been involved in myself. But it also applies to solutions such as Firebase. With the term "realtime API gateway", I mean an API gateway that not only handles HTTP requests, but operates bidirectionally over WebSocket as well.
With web clients, you are seldom interested in events - you are interested in change of state. Events are just means to either describe the changes. By fetching the data through a gateway, it can keep track on which resources the client is currently interested in. It will then keep the client up to date for as long as the data is being used.
Pros
Scales well. Client requires no custom code for event handling, as the system updates the client data for you. Handles recovery from lost connections. No data races. Simple to work with.
Cons
Primarily for client rendered web sites (using React, Vue, Angular, etc), as it works poorly with sites with server-rendered pages. Harder to apply to already existing HTTP API's.

At what point are WebSockets less efficient than Polling?

While I understand that the answer to the above question is somewhat determined by your application's architecture, I'm interested mostly in very simple scenarios.
Essentially, if my app is pinging every 5 seconds for changes, or every minute, around when will the data being sent to maintain the open Web Sockets connection end up being more than the amount you would waste by simple polling?
Basically, I'm interested in if there's a way of quantifying how much inefficiency you incur by using frameworks like Meteor if an application doesn't necessarily need real-time updates, but only periodic checks.
Note that my focus here is on bandwidth utilization, not necessarily database access times, since frameworks like Meteor have highly optimized methods of requesting only updates to the database.
The whole point of a websocket connection is that you don't ever have to ping the app for changes. Instead, the client just connects once and then the server can just directly send the client changes whenever they are available. The client never has to ask. The server just sends data when it's available.
For any type of server-initiated-data, this is way more efficient with bandwidth than http polling. Besides giving you much more timely results (the results are delivered immediately rather than discovered by the client only on the next polling interval).
For pure bandwidth usage, the details would depend upon the exact circumstances. An http polling request has to set up a TCP connection and confirm that connection (even more data if its an SSL connection), then it has to send the http request, including any relevant cookies that belong to that host and including relevant headers and the GET URL. Then, the server has to send a response. And, most of the time all of this overhead of polling will be completely wasted bandwidth because there's nothing new to report.
A webSocket starts with a simple http request, then upgrades the protocol to the webSocket protocol. The webSocket connection itself need not send any data at all until the server has something to send to the client in which case the server just sends the packet. Sending the data itself has far less overhead too. There are no cookies, no headers, etc... just the data. Even if you use some keep-alives on the webSocket, that amount of data is incredibly tiny compared to the overhead of an HTTP request.
So, how exactly much you would save in bandwidth depends upon the details of the circumstances. If it takes 50 polling requests before it finds any useful data, then every one of those http requests is entirely wasted compared to the webSocket scenario. The difference in bandwidth could be enormous.
You asked about an application that only needs periodic checks. As soon as you have a periodic check that results in no data being retrieved, that's wasted bandwidth. That's the whole idea of a webSocket. You consume no bandwidth (or close to no bandwidth) when there's no data to send.
I believe #jfriend00 answered the question very clearly. However, I do want to add a thought.
By throwing in a worst case (and improbable) scenario for Websockets vs. HTTP, you would clearly see that a Websocket connection will always have an advantage in regards to bandwidth (and probably all-round performance).
This is the worst case scenario for Websockets v/s HTTP:
your code uses Websocket connections the exact same way it uses HTTP requests, for polling.
(which isn't something you would do, I know, but it is a worst case scenario).
Every polling event is answered positively - meaning that no HTTP requests were performed in vain.
This is the worst possible situation for Websockets, which are designed for pushing data rather than polling... even in this situation Websockets will save you both bandwidth and CPU cycles.
Seriously, even ignoring the DNS query (performed by the client, so you might not care about it) and the TCP/IP handshake (which is expensive for both the client and the server), a Websocket connection is still more performant and cost-effective.
I'll explain:
Each HTTP request includes a lot of data, such as cookies and other headers. In many cases, each HTTP request is also subject to client authentication... rarely is data given away to anybody.
This means that HTTP connections pass all this data (and possibly perform client authentication) once per request.[Stateless]
However, Websocket connections are stateful. The data is sent only once (instead of each time a request is made). Client authentication occurs only during the Websocket connection negotiation.
This means that Websocket connections pass the same data (and possibly perform client authentication) once per connection (once for all polls).
So even in this worst case scenario, where polling is always positive and Websockets are used for polling instead of pushing data, Websockets will still save your server both bandwidth and other resources (i.e. CPU time).
I think the answer to your question, simply put, is "never". Websockets are never less efficient than polling.

What do you get with HTML5 web sockets that you can't have with AJAX?

Ian Hickson says:
I expect the iframe sandboxing feature
will be a big boon to developers if it
takes off. My own personal favorite
feature is probably the Web Sockets
API, which allows two-way
communication with a server so that
you can implement games, chatting,
remote controls, and so forth.
What can you get with web sockets that you can't get with AJAX? Is it just convenience, or is it somehow more efficient? Is it that the server can send data to the client, without having to wait for a message so it can respond?
Yes, it's all about the server being able to push data to the client. Currently, simulating bi-directional communication without Flash/Silverlight/Java/ActiveX takes the form of one of two workarounds:
Traditional polling: Clients make small requests to the server frequently, checking for updates. Even if no update has occurred, the client doesn't know that and must continuously poll for updates. Though each request may be lightweight, constant polling by many clients can add up quickly.
Long polling: Clients make periodic requests for updates, like regular polling, but if there are no updates yet available then the server does not respond immediately and holds the connection open. When an update is finally available, the server pushes that down to the client, which acts on it and then repeats that process. Long polling offers push-like update resolution, but is basically a self-inflicted DDoS attack and can be very resource intensive for many types of web servers.
With WebSockets, you get all of the responsiveness advantages of long polling, with dramatically less server-side overhead.
WebSockets are more efficient (and "more real-time") than AJAX calls because you keep connection open and don't send extra protocol headers and other stuff after each request and response. Look at this article:
During making connection with
WebSocket, client and server exchange
data per frame which is 2 bytes each,
compared to 8 kilo bytes of http
header when you do continuous polling.

What is the disadvantage of using websocket/socket.io where ajax will do?

Similar questions have been asked before and they all reached the conclusion that AJAX will not become obsolete. But in what ways is ajax better than websockets?
With socket.io, it's easy to fall back to flash or long polling, so browser compatibility seems to be a non-issue.
Websockets are bidirectional. Where ajax would make an asynchronous request, websocket client would send a message to the server. The POST/GET parameters can be encoded in JSON.
So what is wrong with using 100% websockets? If every visitor maintains a persistent websocket connection to the server, would that be more wasteful than making a few ajax requests throughout the visit session?
I think it would be more wasteful. For every connected client you need some sort of object/function/code/whatever on the server paired up with that one client. A socket handler, or a file descriptor, or however your server is setup to handle the connections.
With AJAX you don't need a 1:1 mapping of server side resource to client. Your # of clients can scale less dependently than your server-side resources. Even node.js has its limitations to how many connections it can handle and keep open.
The other thing to consider is that certain AJAX responses can be cached too. As you scale up you can add an HTTP cache to help reduce the load from frequent AJAX requests.
Short Answer
Keeping a websocket active has a cost, for both the client and the server, whether Ajax will have a cost only once, depending on what you're doing with it.
Long Answer
Websockets are often misunderstood because of this whole "Hey, use Ajax, that will do !". No, Websockets are not a replacement for Ajax. They can potentially be applied to the same fields, but there are cases where using Websocket is absurd.
Let's take a simple example : A dynamic page which loads data after the page is loaded on the client side. It's simple, make an Ajax call. We only need one direction, from the server to the client. The client will ask for these data, the server will send them to the client, done. Why would you implement websockets for such a task ? You don't need your connection to be opened all the time, you don't need the client to constantly ask the server, you don't need the server to notify the client. The connection will stay open, it will waste resources, because to keep a connection open you need to constantly check it.
Now for a chat application things are totally different. You need your client to be notified by the server instead of forcing the client to ask the server every x seconds or milliseconds if something is new. It would make no sense.
To understand better, see that as two persons. One of the two is the server, the over is the client. Ajax is like sending a letter. The client sends a letter, the server responds with another letter. The fact is that, for a chat application the conversation would be like that :
"Hey Server, got something for me ?
- No.
- Hey Server, got something for me ?
- No.
- Hey Server, got something for me ?
- Yes, here it is."
The server can't actually send a letter to the client, if the client never asked for an answer. It's a huge waste of resources. Because for every Ajax request, even if it's cached, you need to make an operation on the server side.
Now the case I discussed earlier with the data loaded with Ajax. Imagine the client is on the phone with the server. Keeping the connection active has a cost. It costs electricity and you have to pay your operator. Now why would you need to call someone and keep him on phone for an hour, if you just want that person to tell you 3 words ? Send a goddamn letter.
In conclusion Websockets are not a total replacement for Ajax !
Sometimes you will need Ajax where Websocket usage is absurd.
Edit : The SSE case
That technology isn't used very widely but it can be useful. As its name states it, Server-Sent Events are a one-way push from the server to the client. The client doesn't request anything, the server just sends the data.
In short :
- Unidirectional from the client : Ajax
- Unidirectional from the server : SSE
- Bidirectional : Websockets
Personally, I think that websockets will be used more and more in web applications instead of AJAX. They are not well suited to web sites where caching and SEO are of greater concern, but they will do wonders for webapps.
Projects such as DNode and socketstream help to remove the complexity and enable simple RPC-style coding. This means your client code just calls a function on the server, passing whatever data to that function it wants. And the server can call a function on the client and pass it data as well. You don't need to concern yourself with the nitty gritties of TCP.
Furthermore, there is a lot of overhead with AJAX calls. For instance, a connection needs to be established and HTTP headers (cookies, etc.) are passed with every request. Websockets eliminate much of that. Some say that websockets are more wasteful, and perhaps they are right. But I'm not convinced that the difference is really that substantial.
I answered another related question in detail, including many links to related resources. You might check it out:
websocket api to replace rest api?
I think that sooner or later websocket based frameworks will start to popup not just for writing real-time chat like parts of web apps, but also as standalone web frameworks. Once permanent connection is created it can be used for receiving all kinds of stuff including UI parts of web application which are now served for example through AJAX requests. This approach may hurt SEO in some way although it can reduce amount of traffic and load generated by asynchronous requests which includes redundant HTTP headers.
However I doubt that websockets will replace or endanger AJAX because there are numerous scenarios where permanent connections are unnecessary or unwanted. For example mashup applications which are using (one time) single purpose REST based services that doesn't need to be permanently connected with clients.
There's nothing "wrong" about it.
The only difference is mostly readability. The main advantage of Ajax is that it allows you fast development because most of the functionality is written for you.
There's a great advantage in not having to re-invent the wheel every time you want to open a socket.
WS:// connections have far less overhead than "AJAX" requests.
As other people said, keeping the connection open can be overkill in some scenarios where you don't need server to client notifications, or client to server request happens with low frecuency.
But another disadvantage is that websockets is a low level protocol, not offering additional features to TCP once the initial handshake is performed. So when implementing a request-response paradigm over websockets, you will probably miss features that HTTP (a very mature and extense protocol family) offers, like caching (client and shared caches), validation (conditional requests), safety and idempotence (with implications on how the agent behaves), range requests, content types, status codes, ...
That is, you reduce message sizes at a cost.
So my choice is AJAX for request-response, websockets for server pushing and high frequency low latency messaging
If you want the connection to server open and if continuous polling to the server will be there then go for sockets else you are good to go with ajax.
Simple Analogy :
Ajax asks questions(requests) to server and server gives answers(responses) to these questions. Now if you want to ask continuous questions then ajax wont work, it has a large overhead which will require resources at both the ends.

Progress notifications from HTTP/REST service

I'm working on a web application that submits tasks to a master/worker system that farms out the tasks to any of a series of worker instances. The work queue master runs as a separate process (on a separate machine altogether) and tasks are submitted to the master via HTTP/REST requests. Once tasks are submitted to the work queue, client applications can submit another HTTP request to get status information about tasks.
For my web application, I'd like it to provide some sort of progress bar view that gives the user some indication of how far along task processing has come. The obvious way to implement this would be an AJAX progress meter widget that periodically polls the work queue for status on the tasks that have been submitted. My question is, is there a better way to accomplish this without the frequent polling?
I've considered having the client web application open up a server socket on which it could listen for notifications from the work master. Another similar thought I've had is to use XMPP or a similar protocol for the status notifications. (Of course, the master/worker system would need to be updated to provide notifications either way but I own the code for that so can make any necessary updates myself.)
Any thoughts on the best way to set up a notification system like this? Is the extra effort involved worth it, or is the simple polling solution the way to go?
Polling
The client keeps polling the server to get the status of the response.
Pros
Being really RESTful means cacheable and scaleable.
Cons
Not the best responsiveness if you do not want to poll your server too much.
Persistent connection
The server does not close its HTTP connection with the client until the response is complete. The server can send intermediate status through this connection using HTTP multiparts.
Comet is the most famous framework to implement this behaviour.
Pros
Best responsiveness, almost real-time notifications from the server.
Cons
Connection limit is limited on a web server, keeping a connection open for too long might, at best load your server, at worst open the server to Denial of Service attacks.
Client as a server
Make the server post status updates and the response to the client as if it were another RESTful application.
Pros
Best of every worlds, no resources are wasted waiting for the response, either on the server or on the client side.
Cons
You need a full HTTP server and web application stack on the client
Firewalls and routers with their default "no incoming connections at all" will get in the way.
Feel free to edit to add your thoughts or a new method!
I guess it depends on a few factors
How accurate the feedback can be (1 percent, 5 percent, 50 percent) Accurate feedback makes it worth pursuing some kind of progress bar and comet style push. If you can only say "Busy... hold on... almost there... done" then a simple ajax "are we there yet" poll is certainly easier to code.
How timely the Done message has to be seen by the client
How long each task takes (1 second, 10 seconds, 10 minutes)
1 second makes it a bit moot. 10 seconds makes it worth it. 10 minutes means you're better off suggesting the user goes for a coffee break :-)
How many concurrent requests there will be
Unless you've got a "special" server, live push style systems tend to eat connections and you'll be maxed out pretty quickly. Having to throw more webservers in for a fancy progress bar might hurt the budget.
I've got some sample code on 871184 that shows a hand rolled "forever frame" which seems to work out well. The project I developed that for isn't hammered all that hard though, the operations take a few seconds and we can give pretty accurate percent. The code uses asp.net and jquery, but the general techniques will work with any server and javascript framework.
edit As John points out, status reporting probably isn't the job of the RESTful service. But there's nothing that says you can't open an iframe on the client that hooks to a page on the server that polls the service. Theory says the server and the service will at least be closer to one another :-)
Look into Comet. You make a single request to the server and the server blocks and holds the connection open until an update in status occurs. Once that happens the response is sent and committed. The browser receives this response, handles it and immediately re-requests the same URL. The effect is that of events being pushed to the browser. There are pros and cons and it might not be appropriate for all use cases but would provide the most timely status updates.
My opinion is to stick with the polling solution, but you might be interested in this Wikipedia article on HTTP Push technologies.
REST depends on HTTP, which is a request/response protocol. I don't think you're going to get a pure HTTP server calling the client back with status.
Besides, status reporting isn't the job of the service. It's up to the client to decide when, or if, it wants status reported.
One approach I have used is:
When the job is posted to the server, the server responds back a pubnub-channel id (one could alternatively use Google's PUB-SUB kind of service).
The client on browser subscribes to that channel and starts listening for messages.
The worker/task server publishes status on that pubnub channel to update the progress.
On receiving messages on the subscribed pubnub-channel, the client updates the web UI.
You could also use self-refreshing iframe, but AJAX call is much better. I don't think there is any other way.
PS: If you would open a socket from client, that wouldn't change much - PHP browser would show the page as still "loading", which is not very user-friendly. (assuming you would push or flush buffer to have other things displayed before)

Resources