I am looking for a library that will help me keep some state in sync between my server and my GUI in "real time". I have the messaging and middleware sorted (push updates etc), but what I need is a protocol on top of that which guarantees that the data stays in sync within some reasonably finite period - an error / dropped message / exception might cause the data to go out of syn for a few seconds, but it should resync or at least know it is out of sync within a few seconds.
This seems like it should be something that has been solved before but I can't seem to find anything suitable - any help much appreciated
More detail - I have a Rich Client (Silverlight but likely to move to Javascript/C# or Java soon) GUI that is served by a JMS type middleware.
I am looking to re engineer some of the data interactions to something like as follows
Each user has their own view on several reasonably small data sets for items such as:
Entitlements (what GUI elements to display)
GUI data (e.g. to fill drop down menus etc)
Grids of business data (e.g. a grid of orders)
Preferences (e.g. how the GUI is laid out)
All of these data sets can be changed on the server at any time and the data should update on the client as soon as possible.
Data is changed via the server – the client asks for a change (e.g. cancel a request) and the server validates it against entitlements and business rules and updates its internal data set which would then send the change back to the GUI. In order to provide user feedback an interim state may be set on the gui (cancel submitted or similar) which is the over ridden by the server response.
At the moment the workflow is:
User authenticates
GUI downloads the initial data sets from the server (which either loads them from the database or some other business objects it has cached)
GUI renders
GUI downloads a snapshot of the business data
GUI subscribes to updates to the business data
As updates come in the GUI updates the model and view on screen
I am looking for a generalised library that would improve on this
Should be cross language using an efficient payload format (e.g. Java back end, C# front end, protobuf data format)
Should be transport agnostic (we use a JMS style middleware we don’t want to replace right now)
The client should be sent a update when a change occurs to the server side dataset
The client and server should be able to check for changes to ensure they are up to date
The data sent should be minimal (minimum delta)
Client and Server should cope with being more than one revision out of sync
The client should be able to cache to disk in between session and then just get deltas on login.
I think the ideal solution would be used something like
Any object (or object tree) can be registered with the library code (this should work with data/objects loaded via Hibernate)
When the object changes the library notifys a listener / callback with the change delta
The listener sends that delta to the client using my JMS
The client gets the update and can give that back to the client side version of the library which will update the client side version of the object
The client should get sufficient information from the update to be able to decide what UI action needs to be taken (notify user, update grid etc)
The client and server periodically check that they are on the same version of the object (e.g. server sends the version number to the client) and can remediate if necessary by either the server sending deltas or a complete refresh if necessary.
Thanks for any suggestions
Wow, that's a lot!
I have a project going on which deals with the Synchronization aspect of this in Javascipt on the front end. There is a testing server wrote in Node.JS (it actually was easy once the client was was settled).
Basically data is stored by key in a dataset and every individual key is versioned. The Server has all versions of all data and the Client can be fed changes from the server. Version conflicts for when something is modified on both client and server are handled by a conflict resolution callback.
It is not complete, infact it only has in-memory stores at the moment but that will change over the new week or so.
The actual notification/downloading and uploading is out of scope for the library but you could just use Sockets.IO for this.
It currently works with jQuery, Dojo and NodeJS, really it's got hardly any dependencies at all.
The project (with a demo) is located at https://github.com/forbesmyester/SyncIt
Related
In implementing a browser-based simple game involving multiple users, I have the server save the game state at certain sync points (not time-based but event-specific). I identify each state by an integer.
When a user refreshes his browser, the server provides the latest state and restores the content in the browser. However, in those few seconds while the browser is loading the latest content after browser-refresh, the state could change again. I do not know how to handle this situation because sending the next state will again raise the same issue.
I want a seamless refresh so none of the other players are impacted when one user refreshes his browser (or for that matter leaves and comes back).
The implementation language is not relevant. I use websockets to communicate between the browser and the server. The server is the intermediary for all communication between users (I am not using WebRTC data channels). What is the best way to sync the application content in multiple browsers?
This is indeed a programming-based question though no code is provided.
Forget the fact that your client exists in a browser. Let's just talk about replication.
The usual approach in databases is to separate snapshots from Write Access Logging (WAL) logs. When you bring a new client up, you select a snapshot and transfer that. Then when the client is ready it asks for WAL logs from that snapshot forward. The same mechanism is used after crashes. The last available snapshot is loaded, then the WAL log is replayed, then the database comes up.
I would suggest the same strategy. This does require efficient storage of snapshots. Some kind of log. And some kind of replay mechanism. Which is a lot of easy to mess up code. If you can use something existing, that would be good.
The first thing that I looked into was using Emscripten to compile Redis to JS, and then try to use Redis' built-in asynchronous replication to replicate to your browser. That may be possible, but the fact that Redis is single-threaded and wants to be a client-server is probably a showstopper.
The next best option that I found is that you can use https://isomorphic-git.org/. Here is how that could build what you need. You simply maintain your current state in a git repository, and keep a WAL log of everything that you've done with it. When a client connects, it clones the repository. Once done it connects to the websocket, tells you what commit it is at, and you send it the WAL log from that point forward. Locally in the browser you run those git commands. If the client simply loses its connection and then rejoins, it can do a git pull, and then follow the same strategy.
This will be a bunch of work for you. But a lot less work than implementing everything from scratch.
We're building a microservice system which new data can come from three(or more) different sources and which eventually effects the end user.
It doesn't matter what the purpose of the system for the question so I'll really try to make it simple. Please see the attached diagram.
Data can come from the following sources:
Back-office site: define the system and user configurations.
Main site: where user interact with the site and make actions.
External sources data: such as partners which can gives additional data(supplementary information) about users.
The services are:
Site-back-office service: serve the back-office site.
User-service: serve the main site.
Import service: imports additional data(supplementary information) from external sources.
User cache service: sync with all the above system data and combine them to pre-prepared cache responses. The reason for that is because the main site should serve hundreds of millions of user and should work with very low latency.
The main idea is:
Each microservice has its own db.
Each microservice can scale.
Each data change on one of the three parts effects the user and should be sent to the cache service so it eventually be reflect on the main site.
The cache (Redis) holds all data combined to pre-prepared responses for the main-site.
Each service data change will be published to pubsub topic for the cache-service to update the Redis db.
The system should serve around 200 million of users.
So... the questions are: .
since the User-cache service can(and must) be scale, what happen if, for example, there are two update data messages waiting on pubsub, one is old and one is new. how to process only the new message and prevent the case when one cache-service instance update the new message data to Redis and only after another cache-service instance override it with the old message.
There is also a case when the Cache-service instance need to first read the current cache user data, make the change on it and only then update the cache with the new data. How to prevent the case when two instances for example read the current cache data while a third instance update it with new data and they override it with their data.
Is it at all possible to pre-prepare responses based on several sources which can periodically change?? what is the right approach to this problem?
I'll try to address some of your points, let me know if I misunderstood what you're asking.
1) I believe you're asking about how to enforce ordering of messages, that an old update does not override a newer one. There "publish_time" field of a message (https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage) to coordinate based on the time the cloud pubsub server received your publish request. If you wish to coordinate based on some other time or ordering mechanism, you can add an attribute to your PubsubMessage or payload to do so.
2) This seems to be a general synchronization problem, not necessarily related to cloud pubsub; I'll leave this to others to answer.
3) Cloud dataflow implements a windowing and watermark mechanism similar to what you're describing. Perhaps you could use this to remove conflicting updates and perform preprocessing prior to writing them to the backing store.
https://beam.apache.org/documentation/programming-guide/#windowing
-Daniel
I would like to process some data in a Qt application. This data can be found on a web page which uses Ajax to dynamically update itself.
For example, the page itself is www.example.com, and it uses Ajax to load data from www.example.com/data, which is a plain text file. If I view www.example.com in a browser, I can clearly see when the data is updated.
The brute force solution would be to just call the QWebView's load(QUrl("www.example.com/data")) every couple of seconds, or every time its loadFinished() signal is emitted, but that would be a waste of bandwidth, an I will be downloading the same data over and over. The time between updates could theoretically be a few seconds, but it could also be minutes, hours, or longer.
Is there a possibility to only reload the data when the page is updated?
The traditional AJAX model uses the following sequence of events:
Browser opens connection
Browser sends request
Server sends response
Server closes connection
Because the connection is closed, there is no way for the server to notify your browser if any data have changed. In order to get this information, you have no option but to query the server periodically.
As you mentioned in your question, this is not very efficient since you can waste a lot of bandwidth if nothing changes for a long while.
WebSockets is a more up-to-date technology that tries to overcome this inefficiency and Qt has a module that caters for this.
Unfortunately, it's not universal yet so, if you want to use WebSocket technology on a third-party server, you need to have traditional AJAX code to fall back on in case WebSockets are not supported.
EDIT:
Unfortunately, WebSockets are not the golden solution. It's still up to the server to have been programmed to send out notifications of changes. If the server does not have this feature, it won't matter if you're using WebSockets or traditional AJAX, you'll still have to keep querying for changes.
thanks everyone!
recently i want to built a small cms on meteor,but have some question
1,cache,page cache,data cache,etc..
For example,when people search some article
in server side:
Meteor.publist('articles',function(keyword){
return Articles.find({keyword:keyword});
});
in client:
Meteor.subscribe('articles',keyword);
that's ok ,but ......
the question is ,everytime people doing so ,it invoke a mongo query,and reduce the performance,
in other framework use common http or https,people can depend on something like squid or varnish to cache the page or data,so everytime you route to a url,you read data from the cache server ,but Meteor built on socket.js or websocket,and I don't know how to cache throught the socket.......I trid varnish ,but seen no effect.
so,may be it ignore the websocket?is there some method to cache the data,in the mongodb,in server,can i add some cache server ?
2, chat
I see the chatroom example in https://github.com/zquestz/simplechat
But unlike implyment using socket.js,this example save the chat message in the mongodb ,so the data flow is message ->mongo->query->people,this invoke the mongo query too!
and in socket.js,just save the socket in the context(or the server side cache),so the data don't go throught the db.
My question is , is there a socket interface in Meteor ,so I can message->socket->people? and if can't , how is the performace in the productive envirment as the chatroom example doing(i see it runs slow ...)
With Meteor, you don't have to worry about caching Mongodb queries. Meteor does that for you. Per the docs on data and security:
Every Meteor client includes an in-memory database cache. To manage the client cache, the server publishes sets of JSON documents, and the client subscribes to those sets. As documents in a set change, the server patches each client's cache.
[...]
Once subscribed, the client uses its cache as a fast local database, dramatically simplifying client code. Reads never require a costly round trip to the server. And they're limited to the contents of the cache: a query for every document in a collection on a client will only return documents the server is publishing to that client.
Because Meteor does poll the server every so often to see if the client's cache needs patching, you're probably seeing those polls happening every now and then. But they probably aren't very large requests. Additionally, due to a feature of Meteor called latency compensation, when you update a data source, the client immediately reflects the change without first waiting on the server. This reduces the appearance of performance reduction to the user.
If you have many documents in mongo, you may also be seeing them all get fetched if you still have the autopublish package enabled. You can fix that by removing it with meteor remove autopublish and write code to only publish the relevant data instead of the entire database.
If you really need to manage caching manually, the docs also go into that:
Sophisticated clients can turn subscriptions on and off to control how much data is kept in the cache and manage network traffic. When a subscription is turned off, all its documents are removed from the cache unless the same document is also provided by another active subscription.
Additional performance improvements to Meteor are currently being worked on, including a DDP-level proxy to support "very large number of clients". You can see more detail on this at the Meteor roadmap.
If you stumble upon this question not because of a lack of understanding of meteor's minimongo and are instead interested in how to cache subscriptions after they are no longer needed for the moment (but they maybe in the future and don't want to keep their extra DDP overhead on client server) there are two package options:
https://github.com/ccorcos/meteor-subs-cache
https://github.com/kadirahq/subs-manager
I was creating a mobile app and cache of database was not working hence I used GroundDB package of meteor https://github.com/raix/Meteor-GroundDB now the database is always in local whenever I restart the app,
Also you need to look in appcache package of meteor to cache the entire app locally.
Our site is divided into several smaller sites recently, which are then distributed in different IDCs.
One of these sites serves user authentication and other user-related services, the other sites access it through web services.
On every site that fetches data remotely, we make a local cache so that we don't have to go remote every time user information is needed.
What cache updating strategy would you recommend to ensure data integrity?
Since you need the updated-policy close to realtime, you definitely need the cache-invalidation notification engine.
There are 2 possible implementation models for it:
1.Pull
Main server pulls child-servers with notification messages like "resourceID=34392 not more valid in your cache".
This message should be sent on each data update on main server.
Poll
Each child-server ask main server about the cache item validity right before serving it to user.
Ofcourse, in this case, main server should keep the list of objects updated during last cache-lifetime period, and respond to "If-object-was-updated" requests very quickly.
As you see in both cases, your main server should trigger an event on each data change.
In first case this event will be transferred via 'notification bus' to child server, and in second case this event will be stored in recently-updated-objects list.
So both options need some code changes on main server.
As for me the second options is much more easy to implement in common, but it`s very depends of the software stack you're using.