Rethinkdb: Can I replace an entire table? - rethinkdb

I'd like to retrieve data from MySQL and feed it to Rethinkdb for clients listening to changes. Can I just replace an entire table with the new results, with clients getting any removed, new, or updated objects?
r.table("tasks").replace( mysqlResults )
Just trying it, it doesn't fail but doesn't seem to do anything. I'm not sure if I don't understanding the usage, or I have a bad driver (php-rql).
EDIT The more I read about it, the more I feel like I'm way off. Is there something w/ that functionality, though? Just tell the table what I want it to be, and have the changes automatically handle the insert/update/deletes?


Cache and update regularly complex data

Lets star with background. I have an api endpoint that I have to query every 15 minutes and that returns complex data. Unfortunately this endpoint does not provide information of what exactly changed. So it requires me to compare the data that I have in db and compare everything and than execute update, add or delete. This is pretty boring...
I came to and idea that I can simply remove all data from certain tables and build everything from scratch... But it I have to also return this cached data to my clients. So there might be a situation that the db will be empty during some request from my client because it will be "refreshing/rebulding". And that cant happen because I have to return something
So I cam to and idea to
Lock the certain db tables so that the client will have to wait for the "refreshing the db"
Do you have any suggestions how to solve the problem?
It sounds like you're using a relational database, so I'll try to outline a solution using database terms. The idea, however, is more general than that. In general, it's similar to Blue-Green deployment.
Have two data tables (or two databases, for that matter); one is active, and one is inactive.
When the software starts the update process, it can wipe the inactive table and write new data into it. During this process, the system keeps serving data from the active table.
Once the data update is entirely done, the system can begin to serve data from the previously inactive table. In other words, the inactive table becomes the active table, and vice versa.

Example micoservice app with CQRS and Event Sourcing

I'm planning to create a simple microservice app (set and get appointments) with CQRS and Event Sourcing but I'm not sure if I'm getting everything correctly. Here's the plan:
docker container: public delivery app with REST endpoints for getting and settings appointments. The endpoints for settings data are triggering a RabbitMQ event (async), the endpoint for getting data are calling the command service (sync).
docker container: for the command service with connection to a SQL database for setting (and editing) appointments. It's listening to the RabbidMQ event of the main app. A change doesn't overwrite the data but creates a new entry with a new version. When data has changed it also fires an event to sync the new data to the query service.
docker container: the SQL database for the command service.
docker container: the query service with connection to a MongoDB. It's listening for changes in the command service to update its database. It's possible for the main app to call for data but not with REST but with ??
docker container: an event sourcing service to listen to all commands and storing them in a MongoDB.
docker container: the event MongoDB.
Here are a couple of questions I don't get:
let's say there is one appointment in the command database and it already got synced to the query service. Now there is a call for changing the title of this appointment. So the command service is not performing an UPDATE but an INSERT with the same id but a new version number. What is it doing afterwards? Reading the new data from the SQL and triggering an event with it? The query service is listening and storing the same data in its MongoDB? Is it overwriting the old data or also creating a new entry with a version? That seems to be quite redundant? Do I in fact really need the SQL database here?
how can the main app call for data from the query service if one don't want to uses REST?
Because it stores all commands in the event DB (6. docker container) it is possible to restore every state by running all commands again in order. Is that "event sourcing"? Or is it "event sourcing" to not change the data in the SQL but creating a new version for each change? I'm confused what exactely event sourcing is and where to apply it. Do I really need the 5. (and 6.) docker container for event sourcing?
When a client wants to change something but afterwards also show the changed data the only way I see is to trigger the change and than wait (let's say with polling) for the query service to have that data. What's a good way to achieve that? Maybe checking for the existing of the future version number?
Is this whole structure a reasonable architecture or am I completely missing something?
Sorry, a lot of questions but thanks for any help!
Let’s take this one first.
Is this whole structure a reasonable architecture or am I completely
missing something?
Nice architecture plan! I know it feels like there are a lot of moving pieces, but having lots of small pieces instead of one big one is what makes this my favorite pattern.
What is it doing afterwards? Reading the new data from the SQL and
triggering an event with it? The query service is listening and
storing the same data in its MongoDB? Is it overwriting the old data
or also creating a new entry with a version? That seems to be quite
redundant? Do I in fact really need the SQL database here?
There are 2 logical databases (which can be in the same physical database but for scaling reasons it's best if they are not) in CQRS – the domain model and the read model. These are very different structures. The domain model is stored as in any CRUD app with third normal form, etc. The read model is meant to make data reads blazing fast by custom designing tables that match the data a view needs. There will be a lot of data duplication in these tables. The idea is that it’s more responsive to have a table for each view and update that table in when the domain model changes because there’s nobody sitting at a keyboard waiting for the view to render so it’s OK for the view model data generation to take a little longer. This results in some wasted CPU cycles because you could update the view model several times before anyone asked for that view, but that’s OK since we were really using up idle time anyway.
When a command updates an aggregate and persists it to the DB, it generates a message for the view side of CQRS to update the view. There are 2 ways to do this. The first is to send a message saying “aggregate 83483 needs to be updated” and the view model requeries everything it needs from the domain model and updates the view model. The other approach is to send a message saying “aggregate 83483 was updated to have the following values: …” and the read side can update its tables without having to query. The first approach requires fewer message types but more querying, while the second is the opposite. You can mix and match these two approaches in the same system.
Since the read side has very different table structures, you need both databases. On the read side, unless you want the user to be able to see old versions of the appointments, you only have to store the current state of the view so just update existing data. On the command side, keeping historical state using a version number is a good idea, but can make db size grow.
how can the main app call for data from the query service if one don't
want to uses REST?
How the request gets to the query side is unimportant, so you can use REST, postback, GraphQL or whatever.
Is that "event sourcing"?
Event Sourcing is when you persist all changes made to all entities. If the entities are small enough you can persist all properties, but in general events only have changes. Then to get current state you add up all those changes to see what your entities look like at a certain point in time. It has nothing to do with the read model – that’s CQRS. Note that events are not the request from the user to make a change, that’s a message which then is used to create a command. An event is a record of all fields that changed as a result of the command. That’s an important distinction because you don’t want to re-run all that business logic when rehydrating an entity or aggregate.
When a client wants to change something but afterwards also show the
changed data the only way I see is to trigger the change and than wait
(let's say with polling) for the query service to have that data.
What's a good way to achieve that? Maybe checking for the existing of
the future version number?
Showing historical data is a bit sticky. I would push back on this requirement if you can, but sometimes it’s necessary. If you must do it, take the standard read model approach and save all changes to a view model table. If the circumstances are right you can cheat and read historical data directly from the domain model tables, but that’s breaking a CQRS rule. This is important because one of the advantages of CQRS is its scalability. You can scale the read side as much as you want if each read instance maintains its own read database, but having to read from the domain model will ruin this. This is situation dependent so you’ll have to decide on your own, but the best course of action is to try to get that requirement removed.
In terms of timing, CQRS is all about eventual consistency. The data changes may not show up on the read side for a while (typically fractions of a second but that's enough to cause problems). If you must show new and old data, you can poll and wait for the proper version number to appear, which is ugly. There are other alternatives involving result queues in Rabbit, but they are even uglier.

Using HIbernate / Spring whats the best way to watch a table for changes to individual records?

Q: What is the proper way to watch a table for record level changes using Hibernate / Spring? The DB is a typical relational database system. Our intent is to move to an in-memory solution some time in the future but we can't do it just yet. Q: Are we on the right track or is there a better approach? Examples?
We've thought of two possibilities. One is to load and cache the whole table and the other is to implement a hibernate event listener. Problem is that we aren't interested in events originating in the current VM. What we are interested in is if someone else changes the table. If we load and cache the entire table we'll still have to figure out an efficient way to know when it changes so we may end up implementing both a cache and a listener. Of course a listener might not help us if it doesn't hear changes external to the VM. Our interest is in individual records which is to say that if a record changes, we want Java to update something else based on that record. Ideally we want to avoid re-loading the entire cache, assuming we use one, from scratch and instead update specific records in the cache as they change.

Realistic Data Backup method for

We are building an iOS app with, but still can't figure out the right way to backup data efficiently.
As a premise, we have and will have a LOT of data store rows.
Say we have a class with 1million rows, assume we have it backed up, then want to bring it back to Parse, after a hazardous situation (like data loss on production).
The few solutions we have considered are the following:
1) Use external server for backup
- use the REST API to constantly back up data to a remote MySQL server (we chose MySQL for customized analytics purpose, since it's way faster and easier to handle data with MySQL for us)
a) - recreate JSON objects from MySQL backup and use the REST API to send back to Parse.
Say we use the batch operation which permits 50 simultaneous objects to be created with 1 query, and assume it takes 1 sec for every query, 1million data sets will take 5.5hours to transfer to Parse.
b) - recreate one JSON file from MySQL backup and use the Dashboard to import data manually.
We just tried with 700,000 records file with this method: it took about 2 hours for the loading indicator to stop and show the number of rows in the left pane, but now it never opens in the right pane (it says "operation time out") and it's over 6hours since the upload started.
So we can't rely on 1.b, and 1.a seems to take too long to recover from a disaster (if we have 10 million records, it'll be like 55 hours = 2.2 days).
Now we are thinking about the following:
2) Constantly replicate data to another app
Create the following in Parse:
- Production App: A
- Replication App: B
So while A is in production, every single query will be duplicated to B (using background job constantly).
The downside is of course that it'll eat up the burst limit of A as it'll simply double the amount of query. So not ideal thinking of scaling up.
What we want is something like AWS RDS which gives an option to automatically backup daily.
I wonder how this could be difficult for Parse since it's based on AWS infra.
Please let me know if you have any idea on this, will be happy to share know-hows.
We’ve noticed an important flaw in the above 2) idea.
If we replicate using REST API, all the objectIds of all Classes will be changed, so every 1to1 or 1toMany relations will be broken.
So we think about putting a uuid for every object class.
Is there any problem about this method?
One thing we want to achieve is
( or in Obj-C “includeKey”),
but I suppose that won’t be possible if we don’t base our app logic on objectId.
Looking for a work around for this issue;
but will uuid-based management be functional under Parse’s Datastore logic?
Parse has never lost production data. While we don't currently offer automated backups, you can request one any time you like, and we're working on making all of this even nicer. Additionally, it's easier in most cases to import the JSON export file through the data browser rather than using the REST batch.
I can confirm that today, Parse did lost my data. Or at least it appeared to be so.
After several errors where detected on multiple apps (agreed by Parse Status twitter account), we could not retrieve data for an app, without any error.
It was because an entire column of one of our class (type pointer) disappeared and data was not present anymore in the dashboard.
We are using this pointer column to filter / retrieve data, so the returned queries and collections were empty.
So we decided to recreate the column manually. By chance, recreating the column, with the same name and type, solved the issue and the data was still there... I can't explain it but I really thought, and the app reacted as if, data were lost.
So an automated backup and restore option is mandatory, it is not an option.
On December 2015 released a new dashboard with an improved export feature.
Just select your app, click on "App Settings" -> "General" -> "Export app data". Parse generates a json-file for every class in your app and sends an email to you, if the export-progress is done.
Sad but true, is winding down:
I had the same issue of backing up parse server data. As parse server is using mongodb that is why backing up data is not an issue I have just done a simple thing. downloaded the mongodb backup from the server. And then restored it using
mongorestore /path-to-mongodump (extracted files)
As parse has been turned to open source.Therefore we can adopt this technique.
For accidental deletes, writing a cloud function 'beforedelete' to backup the current row to another class would work.
For regular backups, manual export of changed records (use filter) will be useful. For recovery this requires you to write scripts / use import option (not so sure) in data browser. You could also write a cloud function replicate data on your backup server (haven't tried this yet).
However there are some limitations to cloud code that you should consider before venturing into it:

Data Synchronization from Relational Database to Couch DB

I need to synchronize my Relational database(Oracle or Mysql) to CouchDb. Do anyone has any idea how its possible. if its possbile than how we can notify the CouchDb for any changes happened on the relational DB.
Thanks in advance.
First of all, you need to change the way you think about database modeling. Synchronizing to CouchDB is not just creating documents of all your tables, and pushing them to Couch.
I'm using CouchDB for a site in production, I'll describe what I did, maybe it will help you:
From the start, we have been using MySQL as our primary database. I had entities mapped out, including their relations. In an attempt to speed up the front-end I decided to use CouchDB as a content repository. The benefit was to have fully prepared documents, that contained all the relational data, so data could be fetched with much less overhead.
Because the documents can contain related entities - say a question document that contains all answers - I first decided what top-level entities I wanted to push to Couch. In my example, only questions would be pushed to Couch, and those documents would contain the answers, and possible some metadata, such as tags, user info, etc. When requesting a question on the frontend, I would only need to fetch one document to have all the information I need at that point.
Now for your second question: how to notify CouchDB of changes. In our case, all the changes in our data are done using a CMS. I have a single point in my code which all edit actions call. That's the place where I hooked in a function that persisted the object being saved to CouchDB. The function determines if this object needs persisting (ie: is it a top level entity), then creates a document of this object (think about some sort of toArray function), and fetches all its relations, recursively. The complete document is then pushed to CouchDB.
Now, in your case, the variables here may be completely different, but the basic idea is the same: figure out what documents you want saved, and how they look like. Then write a function that composes these documents and make sure this is called when changes are made to your relational database.
Notifying CouchDB of a change
CouchDB is very simple. Probably the easiest thing is directly updating an existing document. Two ways to implement this come to mind:
The easiest way is a normal CouchDB update: Fetch the current document by id; modify it; then send it back to Couch with HTTP PUT or POST.
If you have clear application-specific changes (e.g. "the views value was incremented") then writing an _update function seems prudent. Update function are very simple: they receive an HTTP query and a document; they modify the document; and then CouchDB stores the new version. You write update functions in Javascript and they run on the server. It is a great way to "compress" common actions into simpler (and fewer) HTTP queries.
