Distributed time synchronization and web applications

Distributed time synchronization and web applications - time

I'm currently trying to build an application that inherently needs good time synchronization across the server and every client. There are alternative designs for my application that can do away with this need for synchronization, but my application quickly begins to suck when it's not present.
In case I am missing something, my basic problem is this: firing an event in multiple locations at exactly the same moment. As best I can tell, the only way of doing this requires some kind of time synchronization, but I may be wrong. I've tried modeling the problem differently, but it all comes back to either a) a sucky app, or b) requiring time synchronization.
Let's assume I Really Really Do Need synchronized time.
My application is built on Google AppEngine. While AppEngine makes no guarantees about the state of time synchronization across its servers, usually it is quite good, on the order of a few seconds (i.e. better than NTP), however sometimes it sucks badly, say, on the order of 10 seconds out of sync. My application can handle 2-3 seconds out of sync, but 10 seconds is out of the question with regards to user experience. So basically, my chosen server platform does not provide a very reliable concept of time.
The client part of my application is written in JavaScript. Again we have a situation where the client has no reliable concept of time either. I have done no measurements, but I fully expect some of my eventual users to have computer clocks that are set to 1901, 1970, 2024, and so on. So basically, my client platform does not provide a reliable concept of time.
This issue is starting to drive me a little mad. So far the best thing I can think to do is implement something like NTP on top of HTTP (this is not as crazy as it may sound). This would work by commissioning 2 or 3 servers in different parts of the Internet, and using traditional means (PTP, NTP) to try to ensure their sync is at least on the order of hundreds of milliseconds.
I'd then create a JavaScript class that implemented the NTP intersection algorithm using these HTTP time sources (and the associated roundtrip information that is available from XMLHTTPRequest).
As you can tell, this solution also sucks big time. Not only is it horribly complex, but only solves one half the problem, namely giving the clients a good notion of the current time. I then have to compromise on the server, either by allowing the clients to tell the server the current time according to them when they make a request (big security no-no, but I can mitigate some of the more obvious abuses of this), or having the server make a single request to one of my magic HTTP-over-NTP servers, and hoping that request completes speedily enough.
These solutions all suck, and I'm lost.
Reminder: I want a bunch of web browsers, hopefully as many as 100 or more, to be able to fire an event at exactly the same time.

Let me summarize, to make sure I understand the question.
You have an app that has a client and server component. There are multiple servers that can each be servicing many (hundreds) of clients. The servers are more or less synced with each other; the clients are not. You want a large number of clients to execute the same event at approximately the same time, regardless of which server happens to be the one they connected to initially.
Assuming that I described the situation more or less accurately:
Could you have the servers keep certain state for each client (such as initial time of connection -- server time), and when the time of the event that will need to happen is known, notify the client with a message containing the number of milliseconds after the beginning value that need to elapse before firing the event?
To illustrate:
client A connects to server S at time t0 = 0
client B connects to server S at time t1 = 120
server S decides an event needs to happen at time t3 = 500
server S sends a message to A:
S->A : {eventName, 500}
server S sends a message to B:
S->B : {eventName, 380}
This does not rely on the client time at all; just on the client's ability to keep track of time for some reasonably short period (a single session).

It seems to me like you're needing to listen to a broadcast event from a server in many different places. Since you can accept 2-3 seconds variation you could just put all your clients into long-lived comet-style requests and just get the response from the server? Sounds to me like the clients wouldn't need to deal with time at all this way ?
You could use ajax to do this, so yoǘ'd be avoiding any client-side lockups while waiting for new data.
I may be missing something totally here.

If you can assume that the clocks are reasonable stable - that is they are set wrong, but ticking at more-or-less the right rate.
Have the servers get their offset from a single defined source (e.g. one of your servers, or a database server or something).
Then have each client calculate it's offset from it's server (possible round-trip complications if you want lots of accuracy).
Store that, then you the combined offset on each client to trigger the event at the right time.
(client-time-to-trigger-event) = (scheduled-time) + (client-to-server-difference) + (server-to-reference-difference)

Time synchronization is very hard to get right and in my opinion the wrong way to go about it. You need an event system which can notify registered observers every time an event is dispatched (observer pattern). All observers will be notified simultaneously (or as close as possible to that), removing the need for time synchronization.
To accommodate latency, the browser should be sent the timestamp of the event dispatch, and it should wait a little longer than what you expect the maximum latency to be. This way all events will be fired up at the same time on all browsers.

Google found the way to define time as being absolute. It sounds heretic for a physicist and with respect to General Relativity: time is flowing at different pace depending on your position in space and time, on Earth, in the Universe ...
You may want to have a look at Google Spanner database: http://en.wikipedia.org/wiki/Spanner_(database)
I guess it is used now by Google and will be available through Google Cloud Platform.

Related

How get a data without polling?

This is more of a theorical question.
Well, imagine that I have two programas that work simultaneously, the main one only do something when he receives a flag marked with true from a secondary program. So, this main program has a function that will keep asking to the secondary for the value of the flag, and when it gets true, it will do something.
What I learned at college is that the polling is the simplest way of doing that. But when I started working as an developer, coworkers told me that this method generate some overhead or it's waste of computation, by asking every certain amount of time for a value.
I tried to come up with some ideas for doing this in a different way, searched on the internet for something like this, but didn't found a useful way about how to do this.
I read about interruptions and passive ways that can cause the main program to get that data only if was informed by the secondary program. But how this happen? The main program will need a function to check for interruption right? So it will not end the same way as before?
What could I do differently?

There is no magic...
no program will guess when it has new information to be read, what you can do is decide between two approaches,
A -> asks -> B
A <- is informed <- B
whenever use each? it depends in many other factors like:
1- how fast you need the data be delivered from the moment it is generated? as far as possible? or keep a while and acumulate
2- how fast the data is generated?
3- how many simoultaneuos clients are requesting data at same server
4- what type of data you deal with? persistent? fast-changing?
If you are building something like a stocks analyzer where you need to ask the price of stocks everysecond (and it will change also everysecond) the approach you mentioned may be the best
if you are writing a chat based app like whatsapp where you need to check if there is some new message to the client and most of time wont... publish subscribe may be the best
but all of this is a very superficial look into a high impact architecture decision, it is not possible to get the best by just looking one factor
what i want to show is that
coworkers told me that this method generate some overhead or it's
waste of computation
it is not a right statement, it may be in some particular scenario but overhead will always exist in distributed systems

The typical way to prevent polling is by using the Publish/Subscribe pattern.
Your client program will subscribe to the server program and when an event occurs, the server program will publish to all its subscribers for them to handle however they need to.

If you flip the order of the requests you end up with something more similar to a standard web API. Your main program (left in your example) would be a server listening for requests. The secondary program would be a client hitting an endpoint on the server to trigger an event.
There's many ways to accomplish this in every language and it doesn't have to be tied to tcp/ip requests.
I'll add a few links for you shortly.

Well, in most of languages you won't implement such a low level. But theorically speaking, there are different waiting strategies, you are talking about active waiting. Doing this you can easily eat all your memory.
Most of languages implements libraries to allow you to start a process as a service which is at passive waiting and it is triggered when a request comes.

AJAX or WebSockets for a Heartstone-like game?

Game is a one-vs-one turn based 2D card management game to be played in a browser.
It is very much like Hearstone where a player plays a number of cards, observes effects and then passes turn to the opponent.
Game mechanics and prototype are ready and I need to decide on technology.
Server is PHP + MySQL, heard of node.js but have no experience with it.
I cannot have loss of packets, so need to use HTTP I guess.
Initial idea is to have scheduled AJAX call every 5 seconds to get game state for each client to check for:
end of turn
change of game state (and render animation based on it)
Obviously I would also need to validate every action of an active player on the server.
I am concerned of the number of calls to my server (not an expensive hosting) and how many calls would a modest server be capable of handling...
As a plus of Ajax I see guaranteed packet delivery and no issues with proxies involved (which may cut persistent connections).

Websockets reduce latency and server workload ( no need to open a new connection, meaning a key exchange in case of https), provided that you interact frequently.
A great advantage is that you are able to 'push' a message to the client (as opposed to having to 'pull' via Ajax every few seconds.
The server language shouldn't be a problem, but if you plan to maintain / extend it, you should choose carefully (I'm guessing you're a rather new programmer, thus gaining experience in a better suited environment would not be a huge amount of work).
Edit: just to clarify, I would recommend the usage of a websocket for your use case

Batching generation of http responses

I'm trying to find an architecture for the following scenario. I'm building a REST service that performs some computation that can be quickly batch computed. Let's say that computing 1 "item" takes 50ms, and computing 100 "items" takes 60ms.
However, the nature of the client is that only 1 item needs to be processed at a time. So if I have 100 simultaneous clients, and I write the typical request handler that sends one item and generates a response, I'll end up using 5000ms, but I know I could compute the same in 60ms.
I'm trying to find an architecture that works well in this scenario. I.e., I would like to have something that merges data from many independent requests, processes that batch, and generates the equivalent responses for each individual client.
If you're curious, the service in question is python+django+DRF based, but I'm curious about what kind of architectural solutions/patterns apply here and if anything solving this is already available.

At first you could think of a reverse proxy detecting all pattern-specific queries, collecting all theses queries and sending it to your application in an HTTP 1.1 pipeline (pipelining is a way to send a big number of queries one after another and receiving all HTTP responses in the same order at the end, without waiting for a response after each query).
But:
Pipelining is very hard to do well
you would have to code the reverse proxy as I do not know a way to do it
one slow response in the pipeline block all the other responses
you need an http server able to give several queries to your application language, something which never happens if the http server is not directly coded in your application, because usually http is made to work on only one query (like you never receive 2 queries in a PHP env, you receive the 1st one, send the response, and then receive the next one, even if the connection contain 2 queries).
So the good idea would be to do that on the application side. You could identify matching queries, and wait for a small amount of time (10ms?) to see if some other queries are also incoming. You will need a way to communicate between several parallel workers here (like you have 50 application workers and 10 of them have received queries that could be treated in the same batch). This way of communication could be a database (a very fast one) or some shared memory, depends on the technology used.
Then when too much time waiting has been spend (10ms?) or when a big amount of queries are received, one of the worker could collect all queries, run the batch, and tell every other workers that a result is there (here again you need a central point of communication, like LISTEN/NOTIFY in PostgreSQL, a shared memory thing, a message queue service, etc.).
Finally every worker is responsible for sending the right HTTP response.
The key here is having a system where the time you loose in trying to share requests treatment is less important than the time saved in batching several queries together, and in case of low traffic this time should stay reasonnable (as here you will always loose time waiting for nothing). And of course you are also adding some complexity on the system, harder to maintain, etc.

Ajax use on website design

I just want to ask for your experience. I'm designing a public website, using jQuery Ajax in most of operations. I'm having some timeouts, and I think it should be for hosting provider cause. Any of you have expirience in this case and may advise me on some hints (especially on timeouts handling)?
Thanks in advance to all.
Esteve

If you have a half-decent host, chances are these aren't network timeouts but are rather due to insufficient hardware which causes your server-side scripts to take too long to answer. For example if you have an autocomplete field and the script goes through a database of 100,000 entries, this is a breeze for newer servers but older "budget" servers or overcrowded shared hosting servers might croak on it.
Depending on what your Ajax operations are, you may be able to break them down in shorter chunks. If you're doing database queries for example, use LIMIT and OFFSET and only return say, 5 entries at a time. When those 5 entries arrive on the client, make another Ajax call for 5 more, so from the user's point of view the entries will keep coming in and it will look fluid (instead of waiting 30s and possibly timing out before they see all entries at once). If you do this make sure you display a spiffy web 2.0 turning wheel to let the user know if they should be waiting some more or if it's done.

How to sync a list on a server with many clients?

Consider a poker game server which hosts many tables. While a player is at the lobby he has a list of all the active tables and their stats. These stats constantly change while players join, play, and leave tables. Tables can be added and closed.
Somehow, these changes must be notified to the clients.
How would you implement this functionality?
Would you use TCP/UDP for the lobby (that is, should users connect to server to observe the lobby, or would you go for a request-response mechanism)?
Would the server notify clients about each event, or should the client poll the server?
Keep that in mind: Maybe the most important goal of such a system is scalability. It should be easy to add more servers in order to cope with growing awdience, while all the users should see one big list that consists from multiple servers.

This specific issue is a manifestation of a very basic issue in your application design - how should clients be connecting to the server.
When scalability is an issue, always resort to a scalable solution, using non-blocking I/O patterns, such as the Reactor design pattern. Much preferred is to use standard solutions which already have a working and tested implementation of such patterns.
Specifically in your case, which involves a fast-acting game which is constantly updating, it sounds reasonable to use a scalable server (again, non-blocking I/O), which holds a connection to each client via TCP, and updates him on information he needs to know.
Request-response cycle sounds less appropriate for your case, but this should be verified against your exact specifications for your application.

That's my basic suggestion:
The server updates the list (addition, removal, and altering exsisting items) through an interface that keeps a queue of a fixed length of operations that have been applied on the list. Each operation is given a timestamp. When the queue is full, the oldest operations are progressivly discarded.
When the user first needs to retrive the list, it asks the server to send him the complete list. The server sends the list with the current timestamp.
Once each an arbitary period of time (10-30 seconds?) the client asks the server to send him all the operations that have been applied to the list since the timestamp he got.
The server then checks if the timestamp still appears in the list (that is, it's bigger than the timestamp of the first item), and if so, sends the client the list of operations that have occured from that time to the present, plus the current timestamp. If it's too old, the server sends the complete list again.
UDP seems to suit this approach, since it's no biggy if once in a while an "update cycle" get's lost.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio