For the simplest example possible, let's say I'm pushing a list of my favorite foods to everyone who subscribes.
r.table('food').changes().run(conn, (err, cursor) => {
cursor.each((err, change) => {
io.emit('NEW_FAVORITE', change);
})
})
Now let's say I have 500 people actively watching me add my favorite foods. What would be more performant, 500 people subscribed to 500 changefeeds that each have include_initial, or 500 initial queries pushed to those individuals & then 500 people watching 1 changefeed? Bonus points for explaining why!
You can't have multiple clients reading from one changefeed, so the only way to get 500 people watching one changefeed is to have a single client reading from that changefeed and then pushing to 500 people.
RethinkDB deduplicates changefeed messages inside the cluster if multiple clients are subscribed to the same table, so this isn't really any different than having 500 open changefeeds in terms of network traffic. The server will use a little more memory because it's tracking which changefeeds have which messages, but if you have one client reading from a changefeed and pushing to 500 people it would have to track that too.
The real reason to use include_initial though is that it prevents races. If you do a read and then open a changefeed, it's possible for a change to occur between the end of the read and when the changefeed starts. include_initial prevents that by atomically switching over from reading to passing on changes.
(One complication is the case where you have 500 processes on machine A that want to read from a RethinkDB server on machine B. In that case there's a difference in network traffic between the two solutions because if you put one client on machine A reading from a changefeed and pushing to the processes, each change gets sent from B to A once and then transferred to the processes locally, while in the other case each change is sent 500 times over the network. If the network connection between A and B is slow compared to the time to transfer between processes on machine A, then that matters a lot. The best way to resolve that is to add a proxy node on machine A and open 500 changefeeds on that node, since RethinkDB will deduplicate the messages to the proxy node.)
Related
Currently I have a REP/REQ model up and running in my code.
However, I do not need for either to send replies. So replies are just wasting time.. I don't know if that matters in the real world or not.
Basically it looks like this.
Client PCs - Connect - REQ
these guys all connect to the Server and update the Server with Info they have on a regular basis. They don't care if the Server didn't receive a particular message, nor do they need any info back from the Server.
there are many of these guys but not excessive.. Let's say between 10 and 100.. all hitting the same server.. well probably not, probably it will be in groups.. a group of them hit one server, another group another.. clients would send messages several times a second. But not much more than several. I have not really done any timing, I don't know how really to time on my computer at less than 1-2 ms resolution so I really don't know what to expect or what is feasible in terms of performance and how many REQ clients can be served by 1 server REP.
Server PC - Bind - REP
this guy sits there running in a loop on his own separate thread waiting for REQs to come in. He sends replies to the REQs because he has to, not because he really wants to or needs to.
Alternate Models
from some googling it seems that PUSH PULL was recommended if you just want to sent messages and don't care about replies.
However, I couldn't figure out how to fit that into my architecture because the binds and connects seem to be reversed from what I need to have.. I would like my Bind to be on the Server because the Client "Connect" guys are not always available to be reached..
Solutions
1) good alternate model
A good alternate model that works and is relatively simple would be great. I'm not sure there really is one but apart from REP/REQ and PUB/SUB I don't really know too much about other models.
2) I'm worrying about nothing?
if message replies to REQ by REP are always going to be really fast and the reception of those replies by by REQ from REP also are really fast, then I guess I'm worrying about nothing. That would be good to know, so feel free to let me know if this is the case.
The Connection question
I don't really understand what connecting sockets does.
On a client REQ should I make a connect at the start of each loop before sending that one single message? Or should I connect before the loop to my socket that I also created before the loop?
I also don't understand what this means in terms of reliability or if I have to make special checks about connected status and reconnect, or if that is done automatically.
To sum up
I have a "global" context.. created at the start, disposed of at the end
This daddy context has 1 or 2 sockets (connected to the same address, including port) - I'm still debugging this dual socket on the same address thing so I'm not sure if that is ok or it just doesn't work that way - clarification would be nice
These context(s) are lazy initialized and outside the loop scope, so we are not recreating sockets on a regular basis
connect calls for the sockets occur currently outside of the loop scope, but I'm not sure if it is not better to have them inside the loop scope.
I think I'm getting mixed up here.. I think the dual sockets are on my PUB/SUB model .. 1 PUB with 2 SUB sockets on each client, but anyhow please let me know if that would be a problem as well.
If you do not need Request-Reply, do not use it.
Request-Reply is generally slow because you need a round trip to the server for every message. This means you get twice the network latency, which is the time a network package needs to travel over the network. That does not matter if network traffic is low but will become a bottleneck when the traffic is high, for example multiple messages per second.
As you already mentioned Push-Pull is a valid alternative for one-way traffic. With Push-Pull you create a Pull socket on the server and bind it to an endpoint (this is similar to the Reply socket). You create a Push socket on the clients and connect it to the server endpoint (this is similar to the Request socket).
If you send multiple messages from the client to the same server, you should connect only once. Setting up a network connection is a costly operation because it requires multiple network round trips, at least for TCP.
I am new to the topic. Having read a handful of articles on it, and asked a couple of persons, I still do not understand what you people do in regard to one problem.
There are UI clients making requests to several backend instances (for now it's irrelevant whether sessions are sticky or not), and those instances are connected to some highly available DB cluster (may it be Cassandra or something else of even Elasticsearch). Say the backend instance is not specifically tied to one or cluster's machines, and instead its every request to DB may be served by a different machine.
One client creates some record, it's synchronously of asynchronously stored to one of cluster's machines then eventually gets replicated to the rest of DB machines. Then another client requests the list or records, the request ends up served by a distant machine not yet received the replicated changes, and so the client does not see the record. Well, that's bad but not yet ugly.
Consider however that the second client hits the machine which has the record, displays it in a list, then refreshes the list and this time hits the distant machine and again does not see the record. That's very weird behavior to observe, isn't it? It might even get worse: the client successfully requests the record, starts some editing on it, then tries to store the updates to DB and this time hits the distant machine which says "I know nothing about this record you are trying to update". That's an error which the user will see while doing something completely legitimate.
So what's the common practice to guard against this?
So far, I only see three solutions.
1) Not actually a solution but rather a policy: ignore the problem and instead speed up the cluster hard enough to guarantee that 99.999% of changes will be replicated on the whole cluster in, say, 0.5 secord (it's hard to imagine some user will try to make several consecutive requests to one record in that time; he can of course issue several reading requests, but in that case he'll probably not notice inconsistency between results). And even if sometimes something goes wrong and the user faces the problem, well, we just embrace that. If the loser gets unhappy and writes a complaint to us (which will happen maybe once a week or once an hour), we just apologize and go on.
2) Introduce an affinity between user's session and a specific DB machine. This helps, but needs explicit support from the DB, and also hurts load-balancing, and invites complications when the DB machine goes down and the session needs to be re-bound to another machine (however with proper support from DB I think that's possible; say Elasticsearch can accept routing key, and I believe if the target shard goes down it will just switch the affinity link to another shard - though I am not entirely sure; but even if re-binding happens, the other machine may contain older data :) ).
3) Rely on monotonic consistency, i.e. some method to be sure that the next request from a client will get results no older than the previous one. But, as I understand it, this approach also requires explicit support from DB, like being able so pass some "global version timestamp" to a cluster's balancer, which it will compare with it's latest data on all machines' timestamps to determine which machines can serve the request.
Are there other good options? Or are those three considered good enough to use?
P.S. My specific problem right now is with Elasticsearch; AFAIK there is no support for monotonic reads there, though looks like option #2 may be available.
Apache Ignite has primary partition for a key and backup partitions. Unless you have readFromBackup option set, you will always be reading from primary partition whose contents is expected to be reliable.
If a node goes away, a transaction (or operation) should be either propagated by remaining nodes or rolled back.
Note that Apache Ignite doesn't do Eventual Consistency but instead Strong Consistency. It means that you can observe delays during node loss, but will not observe inconsistent data.
In Cassandra if using at least quorum consistency for both reads and writes you will get monotonic reads. This was not the case pre 1.0 but thats a long time ago. There are some gotchas if using server timestamps but thats not by default so likely wont be an issue if using C* 2.1+.
What can get funny is since C* uses timestamps is things that occur at "same time". Since Cassandra is Last Write Wins the times and clock drift do matter. But concurrent updates to records will always have race conditions so if you require strong read before write guarantees you can use light weight transactions (essentially CAS operations using paxos) to ensure no one else updates between your read to update, these are slow though so I would avoid it unless critical.
In a true distributed system, it does not matter where your record is stored in remote cluster as long as your clients are connected to that remote cluster. In Hazelcast, a record is always stored in a partition and one partition is owned by one of the servers in the cluster. There could be X number of partitions in the cluster (by default 271) and all those partitions are equally distributed across the cluster. So a 3 members cluster will have a partition distribution like 91-90-90.
Now when a client sends a record to store in Hazelcast cluster, it already knows which partition does the record belong to by using consistent hashing algorithm. And with that, it also knows which server is the owner of that partition. Hence, the client sends its operation directly to that server. This approach applies on all client operations - put or get. So in your case, you may have several UI clients connected to the cluster but your record for a particular user is stored on one server in the cluster and all your UI clients will be approaching that server for their operations related to that record.
As for consistency, Hazelcast by default is strongly consistent distributed cache, which implies that all your updates to a particular record happen synchronously, in the same thread and the application waits until it has received acknowledgement from the owner server (and the backup server if backups are enabled) in the cluster.
When you connect a DB layer (this could be one or many different types of DBs running in parallel) to the cluster then Hazelcast cluster returns data even if its not currently present in the cluster by reading it from DB. So you never get a null value. On updating, you configure the cluster to send the updates downstream synchronously or asynchronously.
Ah-ha, after some even more thorough study of ES discussions I found this: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-preference.html
Note how they specifically highlight the "custom value" case, recommending to use it exactly to solve my problem.
So, given that's their official recommendation, we can summarise it like this.
To fight volatile reads, we are supposed to use "preference",
with "custom" or some other approach.
To also get "read your
writes" consistency, we can have all clients use
"preference=_primary", because primary shard is first to get all
writes. This however will probably have worse performance than
"custom" mode due to no distribution. And that's quite similar to what other people here said about Ignite and Hazelcast.
Right?
Of course that's a solution specifically for ES. Reverting to my initial question which is a bit more generic, turns out that options #2 and #3 are really considered good enough for many distributed systems, with #3 being possible to achieve with #2 (even without immediate support for #3 by DB).
I'm trying to find an architecture for the following scenario. I'm building a REST service that performs some computation that can be quickly batch computed. Let's say that computing 1 "item" takes 50ms, and computing 100 "items" takes 60ms.
However, the nature of the client is that only 1 item needs to be processed at a time. So if I have 100 simultaneous clients, and I write the typical request handler that sends one item and generates a response, I'll end up using 5000ms, but I know I could compute the same in 60ms.
I'm trying to find an architecture that works well in this scenario. I.e., I would like to have something that merges data from many independent requests, processes that batch, and generates the equivalent responses for each individual client.
If you're curious, the service in question is python+django+DRF based, but I'm curious about what kind of architectural solutions/patterns apply here and if anything solving this is already available.
At first you could think of a reverse proxy detecting all pattern-specific queries, collecting all theses queries and sending it to your application in an HTTP 1.1 pipeline (pipelining is a way to send a big number of queries one after another and receiving all HTTP responses in the same order at the end, without waiting for a response after each query).
But:
Pipelining is very hard to do well
you would have to code the reverse proxy as I do not know a way to do it
one slow response in the pipeline block all the other responses
you need an http server able to give several queries to your application language, something which never happens if the http server is not directly coded in your application, because usually http is made to work on only one query (like you never receive 2 queries in a PHP env, you receive the 1st one, send the response, and then receive the next one, even if the connection contain 2 queries).
So the good idea would be to do that on the application side. You could identify matching queries, and wait for a small amount of time (10ms?) to see if some other queries are also incoming. You will need a way to communicate between several parallel workers here (like you have 50 application workers and 10 of them have received queries that could be treated in the same batch). This way of communication could be a database (a very fast one) or some shared memory, depends on the technology used.
Then when too much time waiting has been spend (10ms?) or when a big amount of queries are received, one of the worker could collect all queries, run the batch, and tell every other workers that a result is there (here again you need a central point of communication, like LISTEN/NOTIFY in PostgreSQL, a shared memory thing, a message queue service, etc.).
Finally every worker is responsible for sending the right HTTP response.
The key here is having a system where the time you loose in trying to share requests treatment is less important than the time saved in batching several queries together, and in case of low traffic this time should stay reasonnable (as here you will always loose time waiting for nothing). And of course you are also adding some complexity on the system, harder to maintain, etc.
Summary:
How do I synchronize very large amount of data with a client which can't hold all the data in memory and keeps disconnecting?
Explanation:
I have a real-time (ajax/comet) app which will display some data on the web.
I like to think of this as the view being on the web and the model being on the server.
Say I have a large number of records on the server, all of them being added/removed/modified all the time. Here are the problems:
-This being the web, the client is likely to have many connections/disconnections. While the client is disconnected, data may have been modified and the client will need to be updated when reconnected. However, the client can't be sent ALL the data every time there is a re-connections, since the data is so large.
-Since there is so much data, the client obviously can't be sent all of it. Think of a gmail account with thousands of messages or google map with ... the whole world!
I realize that initially a complete snapshot of some relevant subset of data will be sent to the client, and thereafter only incremental updates. This will likely be done through some sort of sequence numbers...the client will say "the last update I received was #234" and the client will send all messages between #234 and #current.
I also realize that the client-view will notify the server that it is 'displaying' records 100-200 "so only send me those" (perhaps 0-300, whatever the strategy).
However, I hate the idea of coding all of this myself. There is a general enough and common enough problem that there must be libraries (or at least step-by-step recipes) already.
I am looking to do this either in Java or node.js. If solutions are available in other languages, I'll be willing to switch.
Try a pub/sub solution. Subscribe the client at a given start time to your server events.
The server logs all data change events based on the time they occur.
After a given tim eor reconnect of your client the client asks for a list of all changed data rows since last sync.
You can keep all the logic on the server and just sync the changes. Would result in a typical "select * from table where id in (select id from changed_rows where change_date > given_date)" statement on the server, which can be optimized.
Consider a poker game server which hosts many tables. While a player is at the lobby he has a list of all the active tables and their stats. These stats constantly change while players join, play, and leave tables. Tables can be added and closed.
Somehow, these changes must be notified to the clients.
How would you implement this functionality?
Would you use TCP/UDP for the lobby (that is, should users connect to server to observe the lobby, or would you go for a request-response mechanism)?
Would the server notify clients about each event, or should the client poll the server?
Keep that in mind: Maybe the most important goal of such a system is scalability. It should be easy to add more servers in order to cope with growing awdience, while all the users should see one big list that consists from multiple servers.
This specific issue is a manifestation of a very basic issue in your application design - how should clients be connecting to the server.
When scalability is an issue, always resort to a scalable solution, using non-blocking I/O patterns, such as the Reactor design pattern. Much preferred is to use standard solutions which already have a working and tested implementation of such patterns.
Specifically in your case, which involves a fast-acting game which is constantly updating, it sounds reasonable to use a scalable server (again, non-blocking I/O), which holds a connection to each client via TCP, and updates him on information he needs to know.
Request-response cycle sounds less appropriate for your case, but this should be verified against your exact specifications for your application.
That's my basic suggestion:
The server updates the list (addition, removal, and altering exsisting items) through an interface that keeps a queue of a fixed length of operations that have been applied on the list. Each operation is given a timestamp. When the queue is full, the oldest operations are progressivly discarded.
When the user first needs to retrive the list, it asks the server to send him the complete list. The server sends the list with the current timestamp.
Once each an arbitary period of time (10-30 seconds?) the client asks the server to send him all the operations that have been applied to the list since the timestamp he got.
The server then checks if the timestamp still appears in the list (that is, it's bigger than the timestamp of the first item), and if so, sends the client the list of operations that have occured from that time to the present, plus the current timestamp. If it's too old, the server sends the complete list again.
UDP seems to suit this approach, since it's no biggy if once in a while an "update cycle" get's lost.