How to sync a list on a server with many clients? - client-server

Consider a poker game server which hosts many tables. While a player is at the lobby he has a list of all the active tables and their stats. These stats constantly change while players join, play, and leave tables. Tables can be added and closed.
Somehow, these changes must be notified to the clients.
How would you implement this functionality?
Would you use TCP/UDP for the lobby (that is, should users connect to server to observe the lobby, or would you go for a request-response mechanism)?
Would the server notify clients about each event, or should the client poll the server?
Keep that in mind: Maybe the most important goal of such a system is scalability. It should be easy to add more servers in order to cope with growing awdience, while all the users should see one big list that consists from multiple servers.

This specific issue is a manifestation of a very basic issue in your application design - how should clients be connecting to the server.
When scalability is an issue, always resort to a scalable solution, using non-blocking I/O patterns, such as the Reactor design pattern. Much preferred is to use standard solutions which already have a working and tested implementation of such patterns.
Specifically in your case, which involves a fast-acting game which is constantly updating, it sounds reasonable to use a scalable server (again, non-blocking I/O), which holds a connection to each client via TCP, and updates him on information he needs to know.
Request-response cycle sounds less appropriate for your case, but this should be verified against your exact specifications for your application.

That's my basic suggestion:
The server updates the list (addition, removal, and altering exsisting items) through an interface that keeps a queue of a fixed length of operations that have been applied on the list. Each operation is given a timestamp. When the queue is full, the oldest operations are progressivly discarded.
When the user first needs to retrive the list, it asks the server to send him the complete list. The server sends the list with the current timestamp.
Once each an arbitary period of time (10-30 seconds?) the client asks the server to send him all the operations that have been applied to the list since the timestamp he got.
The server then checks if the timestamp still appears in the list (that is, it's bigger than the timestamp of the first item), and if so, sends the client the list of operations that have occured from that time to the present, plus the current timestamp. If it's too old, the server sends the complete list again.
UDP seems to suit this approach, since it's no biggy if once in a while an "update cycle" get's lost.

Related

Eventual consistency - how to avoid phantoms

I am new to the topic. Having read a handful of articles on it, and asked a couple of persons, I still do not understand what you people do in regard to one problem.
There are UI clients making requests to several backend instances (for now it's irrelevant whether sessions are sticky or not), and those instances are connected to some highly available DB cluster (may it be Cassandra or something else of even Elasticsearch). Say the backend instance is not specifically tied to one or cluster's machines, and instead its every request to DB may be served by a different machine.
One client creates some record, it's synchronously of asynchronously stored to one of cluster's machines then eventually gets replicated to the rest of DB machines. Then another client requests the list or records, the request ends up served by a distant machine not yet received the replicated changes, and so the client does not see the record. Well, that's bad but not yet ugly.
Consider however that the second client hits the machine which has the record, displays it in a list, then refreshes the list and this time hits the distant machine and again does not see the record. That's very weird behavior to observe, isn't it? It might even get worse: the client successfully requests the record, starts some editing on it, then tries to store the updates to DB and this time hits the distant machine which says "I know nothing about this record you are trying to update". That's an error which the user will see while doing something completely legitimate.
So what's the common practice to guard against this?
So far, I only see three solutions.
1) Not actually a solution but rather a policy: ignore the problem and instead speed up the cluster hard enough to guarantee that 99.999% of changes will be replicated on the whole cluster in, say, 0.5 secord (it's hard to imagine some user will try to make several consecutive requests to one record in that time; he can of course issue several reading requests, but in that case he'll probably not notice inconsistency between results). And even if sometimes something goes wrong and the user faces the problem, well, we just embrace that. If the loser gets unhappy and writes a complaint to us (which will happen maybe once a week or once an hour), we just apologize and go on.
2) Introduce an affinity between user's session and a specific DB machine. This helps, but needs explicit support from the DB, and also hurts load-balancing, and invites complications when the DB machine goes down and the session needs to be re-bound to another machine (however with proper support from DB I think that's possible; say Elasticsearch can accept routing key, and I believe if the target shard goes down it will just switch the affinity link to another shard - though I am not entirely sure; but even if re-binding happens, the other machine may contain older data :) ).
3) Rely on monotonic consistency, i.e. some method to be sure that the next request from a client will get results no older than the previous one. But, as I understand it, this approach also requires explicit support from DB, like being able so pass some "global version timestamp" to a cluster's balancer, which it will compare with it's latest data on all machines' timestamps to determine which machines can serve the request.
Are there other good options? Or are those three considered good enough to use?
P.S. My specific problem right now is with Elasticsearch; AFAIK there is no support for monotonic reads there, though looks like option #2 may be available.
Apache Ignite has primary partition for a key and backup partitions. Unless you have readFromBackup option set, you will always be reading from primary partition whose contents is expected to be reliable.
If a node goes away, a transaction (or operation) should be either propagated by remaining nodes or rolled back.
Note that Apache Ignite doesn't do Eventual Consistency but instead Strong Consistency. It means that you can observe delays during node loss, but will not observe inconsistent data.
In Cassandra if using at least quorum consistency for both reads and writes you will get monotonic reads. This was not the case pre 1.0 but thats a long time ago. There are some gotchas if using server timestamps but thats not by default so likely wont be an issue if using C* 2.1+.
What can get funny is since C* uses timestamps is things that occur at "same time". Since Cassandra is Last Write Wins the times and clock drift do matter. But concurrent updates to records will always have race conditions so if you require strong read before write guarantees you can use light weight transactions (essentially CAS operations using paxos) to ensure no one else updates between your read to update, these are slow though so I would avoid it unless critical.
In a true distributed system, it does not matter where your record is stored in remote cluster as long as your clients are connected to that remote cluster. In Hazelcast, a record is always stored in a partition and one partition is owned by one of the servers in the cluster. There could be X number of partitions in the cluster (by default 271) and all those partitions are equally distributed across the cluster. So a 3 members cluster will have a partition distribution like 91-90-90.
Now when a client sends a record to store in Hazelcast cluster, it already knows which partition does the record belong to by using consistent hashing algorithm. And with that, it also knows which server is the owner of that partition. Hence, the client sends its operation directly to that server. This approach applies on all client operations - put or get. So in your case, you may have several UI clients connected to the cluster but your record for a particular user is stored on one server in the cluster and all your UI clients will be approaching that server for their operations related to that record.
As for consistency, Hazelcast by default is strongly consistent distributed cache, which implies that all your updates to a particular record happen synchronously, in the same thread and the application waits until it has received acknowledgement from the owner server (and the backup server if backups are enabled) in the cluster.
When you connect a DB layer (this could be one or many different types of DBs running in parallel) to the cluster then Hazelcast cluster returns data even if its not currently present in the cluster by reading it from DB. So you never get a null value. On updating, you configure the cluster to send the updates downstream synchronously or asynchronously.
Ah-ha, after some even more thorough study of ES discussions I found this: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-preference.html
Note how they specifically highlight the "custom value" case, recommending to use it exactly to solve my problem.
So, given that's their official recommendation, we can summarise it like this.
To fight volatile reads, we are supposed to use "preference",
with "custom" or some other approach.
To also get "read your
writes" consistency, we can have all clients use
"preference=_primary", because primary shard is first to get all
writes. This however will probably have worse performance than
"custom" mode due to no distribution. And that's quite similar to what other people here said about Ignite and Hazelcast.
Right?
Of course that's a solution specifically for ES. Reverting to my initial question which is a bit more generic, turns out that options #2 and #3 are really considered good enough for many distributed systems, with #3 being possible to achieve with #2 (even without immediate support for #3 by DB).

An event store could become a single point of failure?

Since a couple of days I've been trying to figure it out how to inform to the rest of the microservices that a new entity was created in a microservice A that store that entity in a MongoDB.
I want to:
Have low coupling between the microservices
Avoid distributed transactions between microservices like Two Phase Commit (2PC)
At first a message broker like RabbitMQ seems to be a good tool for the job but then I see the problem of commit the new document in MongoDB and publish the message in the broker not being atomic.
Why event sourcing? by eventuate.io:
One way of solving this issue implies make the schema of the documents a bit dirtier by adding a mark that says if the document have been published in the broker and having a scheduled background process that search unpublished documents in MongoDB and publishes those to the broker using confirmations, when the confirmation arrives the document will be marked as published (using at-least-once and idempotency semantics). This solutions is proposed in this and this answers.
Reading an Introduction to Microservices by Chris Richardson I ended up in this great presentation of Developing functional domain models with event sourcing where one of the slides asked:
How to atomically update the database and publish events and publish events without 2PC? (dual write problem).
The answer is simple (on the next slide)
Update the database and publish events
This is a different approach to this one that is based on CQRS a la Greg Young.
The domain repository is responsible for publishing the events, this
would normally be inside a single transaction together with storing
the events in the event store.
I think that delegate the responsabilities of storing and publishing the events to the event store is a good thing because avoids the need of 2PC or a background process.
However, in a certain way it's true that:
If you rely on the event store to publish the events you'd have a
tight coupling to the storage mechanism.
But we could say the same if we adopt a message broker for intecommunicate the microservices.
The thing that worries me more is that the Event Store seems to become a Single Point of Failure.
If we look this example from eventuate.io
we can see that if the event store is down, we can't create accounts or money transfers, losing one of the advantages of microservices. (although the system will continue responding querys).
So, it's correct to affirmate that the Event Store as used in the eventuate example is a Single Point of Failure?
What you are facing is an instance of the Two General's Problem. Basically, you want to have two entities on a network agreeing on something but the network is not fail safe. Leslie Lamport proved that this is impossible.
So no matter how much you add new entities to your network, the message queue being one, you will never have 100% certainty that agreement will be reached. In fact, the opposite takes place: the more entities you add to your distributed system, the less you can be certain that an agreement will eventually be reached.
A practical answer to your case is that 2PC is not that bad if you consider adding even more complexity and single points of failures. If you absolutely do not want a single point of failure and wants to assume that the network is reliable (in other words, that the network itself cannot be a single point of failure), you can try a P2P algorithm such as DHT, but for two peers I bet it reduces to simple 2PC.
We handle this with the Outbox approach in NServiceBus:
http://docs.particular.net/nservicebus/outbox/
This approach requires that the initial trigger for the whole operation came in as a message on the queue but works very well.
You could also create a flag for each entry inside of the event store which tells if this event was already published. Another process could poll the event store for those unpublished events and put them into a message queue or topic. The disadvantage of this approach is that consumers of this queue or topic must be designed to de-duplicate incoming messages because this pattern does only guarantee at-least-once delivery. Another disadvantage could be latency because of the polling frequency. But since we have already entered the eventually consistent area here this might not be such a big concern.
How about if we have two event stores, and whenever a Domain Event is created, it is queued onto both of them. And the event handler on the query side, handles events popped from both the event stores.
Ofcourse every event should be idempotent.
But wouldn’t this solve our problem of the event store being a single point of entry?
Not particularly a mongodb solution but have you considered leveraging the Streams feature introduced in Redis 5 to implement a reliable event store. Take a look this intro here
I find that it has rich set of features like message tailing, message acknowledgement as well as the ability to extract unacknowledged messages easily. This surely helps to implement at least once messaging guarantees. It also support load balancing of messages using "consumer group" concept which can help with scaling the processing part.
Regarding your concern about being the single point of failure, as per the documentation, streams and consumer information can be replicated across nodes and persisted to disk (using regular Redis mechanisms I believe). This helps address the single point of failure issue. I'm currently considering using this for one of my microservices projects.

Broadcast Server

I am writing a TCP Server that accepts connections from multiple clients, this server gathers data from the system that it's running on and transmits it to every connected client.
What design patterns would be best for this situation?
Example
Put all connections in an array, then loop through the array and send the data to each client one by one. Advantage: very easy to implement. Disadvantage: not very efficient when handling large amounts of data.
An easier way is to use some existing software to do this ... For example use https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=d5bedadd-e46f-4c97-af89-22d65ffee070 .
In case you want to write on your own you will need a list(linked list) to manage the connections.
Here is an example of a server http://examples.oreilly.com/jenut/Server.java
If you want to handle large amounts of data, one of the techniques is to have a queue associated with each of the subscribers at the server end. A multi-threaded program can send the data to the clients from those queues.
A number of patterns have been developed for distributed processing and servers, for instance in the ACE project: http://www.cs.wustl.edu/~schmidt/patterns-ace.html. The design might be focused around the events which announce either that data has been received and may be read, or that buffers have been emptied and more data may now be written. At least in the days when a 32-bit address space was the rule, you could have many more open connections than you had threads, so you would typically have a small number of threads waiting for events which announced that they could safely read or write without stalling until the other side co-operated. This may come from events, or from calls such as select() or poll() terminating. Patterns are also associated with http://zguide.zeromq.org/page:all#-MQ-in-a-Hundred-Words.

synchronize data structures between unreliable client and server when data is too large for client

Summary:
How do I synchronize very large amount of data with a client which can't hold all the data in memory and keeps disconnecting?
Explanation:
I have a real-time (ajax/comet) app which will display some data on the web.
I like to think of this as the view being on the web and the model being on the server.
Say I have a large number of records on the server, all of them being added/removed/modified all the time. Here are the problems:
-This being the web, the client is likely to have many connections/disconnections. While the client is disconnected, data may have been modified and the client will need to be updated when reconnected. However, the client can't be sent ALL the data every time there is a re-connections, since the data is so large.
-Since there is so much data, the client obviously can't be sent all of it. Think of a gmail account with thousands of messages or google map with ... the whole world!
I realize that initially a complete snapshot of some relevant subset of data will be sent to the client, and thereafter only incremental updates. This will likely be done through some sort of sequence numbers...the client will say "the last update I received was #234" and the client will send all messages between #234 and #current.
I also realize that the client-view will notify the server that it is 'displaying' records 100-200 "so only send me those" (perhaps 0-300, whatever the strategy).
However, I hate the idea of coding all of this myself. There is a general enough and common enough problem that there must be libraries (or at least step-by-step recipes) already.
I am looking to do this either in Java or node.js. If solutions are available in other languages, I'll be willing to switch.
Try a pub/sub solution. Subscribe the client at a given start time to your server events.
The server logs all data change events based on the time they occur.
After a given tim eor reconnect of your client the client asks for a list of all changed data rows since last sync.
You can keep all the logic on the server and just sync the changes. Would result in a typical "select * from table where id in (select id from changed_rows where change_date > given_date)" statement on the server, which can be optimized.

Distributed time synchronization and web applications

I'm currently trying to build an application that inherently needs good time synchronization across the server and every client. There are alternative designs for my application that can do away with this need for synchronization, but my application quickly begins to suck when it's not present.
In case I am missing something, my basic problem is this: firing an event in multiple locations at exactly the same moment. As best I can tell, the only way of doing this requires some kind of time synchronization, but I may be wrong. I've tried modeling the problem differently, but it all comes back to either a) a sucky app, or b) requiring time synchronization.
Let's assume I Really Really Do Need synchronized time.
My application is built on Google AppEngine. While AppEngine makes no guarantees about the state of time synchronization across its servers, usually it is quite good, on the order of a few seconds (i.e. better than NTP), however sometimes it sucks badly, say, on the order of 10 seconds out of sync. My application can handle 2-3 seconds out of sync, but 10 seconds is out of the question with regards to user experience. So basically, my chosen server platform does not provide a very reliable concept of time.
The client part of my application is written in JavaScript. Again we have a situation where the client has no reliable concept of time either. I have done no measurements, but I fully expect some of my eventual users to have computer clocks that are set to 1901, 1970, 2024, and so on. So basically, my client platform does not provide a reliable concept of time.
This issue is starting to drive me a little mad. So far the best thing I can think to do is implement something like NTP on top of HTTP (this is not as crazy as it may sound). This would work by commissioning 2 or 3 servers in different parts of the Internet, and using traditional means (PTP, NTP) to try to ensure their sync is at least on the order of hundreds of milliseconds.
I'd then create a JavaScript class that implemented the NTP intersection algorithm using these HTTP time sources (and the associated roundtrip information that is available from XMLHTTPRequest).
As you can tell, this solution also sucks big time. Not only is it horribly complex, but only solves one half the problem, namely giving the clients a good notion of the current time. I then have to compromise on the server, either by allowing the clients to tell the server the current time according to them when they make a request (big security no-no, but I can mitigate some of the more obvious abuses of this), or having the server make a single request to one of my magic HTTP-over-NTP servers, and hoping that request completes speedily enough.
These solutions all suck, and I'm lost.
Reminder: I want a bunch of web browsers, hopefully as many as 100 or more, to be able to fire an event at exactly the same time.
Let me summarize, to make sure I understand the question.
You have an app that has a client and server component. There are multiple servers that can each be servicing many (hundreds) of clients. The servers are more or less synced with each other; the clients are not. You want a large number of clients to execute the same event at approximately the same time, regardless of which server happens to be the one they connected to initially.
Assuming that I described the situation more or less accurately:
Could you have the servers keep certain state for each client (such as initial time of connection -- server time), and when the time of the event that will need to happen is known, notify the client with a message containing the number of milliseconds after the beginning value that need to elapse before firing the event?
To illustrate:
client A connects to server S at time t0 = 0
client B connects to server S at time t1 = 120
server S decides an event needs to happen at time t3 = 500
server S sends a message to A:
S->A : {eventName, 500}
server S sends a message to B:
S->B : {eventName, 380}
This does not rely on the client time at all; just on the client's ability to keep track of time for some reasonably short period (a single session).
It seems to me like you're needing to listen to a broadcast event from a server in many different places. Since you can accept 2-3 seconds variation you could just put all your clients into long-lived comet-style requests and just get the response from the server? Sounds to me like the clients wouldn't need to deal with time at all this way ?
You could use ajax to do this, so yoǘ'd be avoiding any client-side lockups while waiting for new data.
I may be missing something totally here.
If you can assume that the clocks are reasonable stable - that is they are set wrong, but ticking at more-or-less the right rate.
Have the servers get their offset from a single defined source (e.g. one of your servers, or a database server or something).
Then have each client calculate it's offset from it's server (possible round-trip complications if you want lots of accuracy).
Store that, then you the combined offset on each client to trigger the event at the right time.
(client-time-to-trigger-event) = (scheduled-time) + (client-to-server-difference) + (server-to-reference-difference)
Time synchronization is very hard to get right and in my opinion the wrong way to go about it. You need an event system which can notify registered observers every time an event is dispatched (observer pattern). All observers will be notified simultaneously (or as close as possible to that), removing the need for time synchronization.
To accommodate latency, the browser should be sent the timestamp of the event dispatch, and it should wait a little longer than what you expect the maximum latency to be. This way all events will be fired up at the same time on all browsers.
Google found the way to define time as being absolute. It sounds heretic for a physicist and with respect to General Relativity: time is flowing at different pace depending on your position in space and time, on Earth, in the Universe ...
You may want to have a look at Google Spanner database: http://en.wikipedia.org/wiki/Spanner_(database)
I guess it is used now by Google and will be available through Google Cloud Platform.

Resources