Data Synchronization from Relational Database to Couch DB

Data Synchronization from Relational Database to Couch DB - oracle

I need to synchronize my Relational database(Oracle or Mysql) to CouchDb. Do anyone has any idea how its possible. if its possbile than how we can notify the CouchDb for any changes happened on the relational DB.
Thanks in advance.

First of all, you need to change the way you think about database modeling. Synchronizing to CouchDB is not just creating documents of all your tables, and pushing them to Couch.
I'm using CouchDB for a site in production, I'll describe what I did, maybe it will help you:
From the start, we have been using MySQL as our primary database. I had entities mapped out, including their relations. In an attempt to speed up the front-end I decided to use CouchDB as a content repository. The benefit was to have fully prepared documents, that contained all the relational data, so data could be fetched with much less overhead.
Because the documents can contain related entities - say a question document that contains all answers - I first decided what top-level entities I wanted to push to Couch. In my example, only questions would be pushed to Couch, and those documents would contain the answers, and possible some metadata, such as tags, user info, etc. When requesting a question on the frontend, I would only need to fetch one document to have all the information I need at that point.
Now for your second question: how to notify CouchDB of changes. In our case, all the changes in our data are done using a CMS. I have a single point in my code which all edit actions call. That's the place where I hooked in a function that persisted the object being saved to CouchDB. The function determines if this object needs persisting (ie: is it a top level entity), then creates a document of this object (think about some sort of toArray function), and fetches all its relations, recursively. The complete document is then pushed to CouchDB.
Now, in your case, the variables here may be completely different, but the basic idea is the same: figure out what documents you want saved, and how they look like. Then write a function that composes these documents and make sure this is called when changes are made to your relational database.
Notifying CouchDB of a change
CouchDB is very simple. Probably the easiest thing is directly updating an existing document. Two ways to implement this come to mind:
The easiest way is a normal CouchDB update: Fetch the current document by id; modify it; then send it back to Couch with HTTP PUT or POST.
If you have clear application-specific changes (e.g. "the views value was incremented") then writing an _update function seems prudent. Update function are very simple: they receive an HTTP query and a document; they modify the document; and then CouchDB stores the new version. You write update functions in Javascript and they run on the server. It is a great way to "compress" common actions into simpler (and fewer) HTTP queries.

Related

Example micoservice app with CQRS and Event Sourcing

I'm planning to create a simple microservice app (set and get appointments) with CQRS and Event Sourcing but I'm not sure if I'm getting everything correctly. Here's the plan:
docker container: public delivery app with REST endpoints for getting and settings appointments. The endpoints for settings data are triggering a RabbitMQ event (async), the endpoint for getting data are calling the command service (sync).
docker container: for the command service with connection to a SQL database for setting (and editing) appointments. It's listening to the RabbidMQ event of the main app. A change doesn't overwrite the data but creates a new entry with a new version. When data has changed it also fires an event to sync the new data to the query service.
docker container: the SQL database for the command service.
docker container: the query service with connection to a MongoDB. It's listening for changes in the command service to update its database. It's possible for the main app to call for data but not with REST but with ??
docker container: an event sourcing service to listen to all commands and storing them in a MongoDB.
docker container: the event MongoDB.
Here are a couple of questions I don't get:
let's say there is one appointment in the command database and it already got synced to the query service. Now there is a call for changing the title of this appointment. So the command service is not performing an UPDATE but an INSERT with the same id but a new version number. What is it doing afterwards? Reading the new data from the SQL and triggering an event with it? The query service is listening and storing the same data in its MongoDB? Is it overwriting the old data or also creating a new entry with a version? That seems to be quite redundant? Do I in fact really need the SQL database here?
how can the main app call for data from the query service if one don't want to uses REST?
Because it stores all commands in the event DB (6. docker container) it is possible to restore every state by running all commands again in order. Is that "event sourcing"? Or is it "event sourcing" to not change the data in the SQL but creating a new version for each change? I'm confused what exactely event sourcing is and where to apply it. Do I really need the 5. (and 6.) docker container for event sourcing?
When a client wants to change something but afterwards also show the changed data the only way I see is to trigger the change and than wait (let's say with polling) for the query service to have that data. What's a good way to achieve that? Maybe checking for the existing of the future version number?
Is this whole structure a reasonable architecture or am I completely missing something?
Sorry, a lot of questions but thanks for any help!

Let’s take this one first.
Is this whole structure a reasonable architecture or am I completely
missing something?
Nice architecture plan! I know it feels like there are a lot of moving pieces, but having lots of small pieces instead of one big one is what makes this my favorite pattern.
What is it doing afterwards? Reading the new data from the SQL and
triggering an event with it? The query service is listening and
storing the same data in its MongoDB? Is it overwriting the old data
or also creating a new entry with a version? That seems to be quite
redundant? Do I in fact really need the SQL database here?
There are 2 logical databases (which can be in the same physical database but for scaling reasons it's best if they are not) in CQRS – the domain model and the read model. These are very different structures. The domain model is stored as in any CRUD app with third normal form, etc. The read model is meant to make data reads blazing fast by custom designing tables that match the data a view needs. There will be a lot of data duplication in these tables. The idea is that it’s more responsive to have a table for each view and update that table in when the domain model changes because there’s nobody sitting at a keyboard waiting for the view to render so it’s OK for the view model data generation to take a little longer. This results in some wasted CPU cycles because you could update the view model several times before anyone asked for that view, but that’s OK since we were really using up idle time anyway.
When a command updates an aggregate and persists it to the DB, it generates a message for the view side of CQRS to update the view. There are 2 ways to do this. The first is to send a message saying “aggregate 83483 needs to be updated” and the view model requeries everything it needs from the domain model and updates the view model. The other approach is to send a message saying “aggregate 83483 was updated to have the following values: …” and the read side can update its tables without having to query. The first approach requires fewer message types but more querying, while the second is the opposite. You can mix and match these two approaches in the same system.
Since the read side has very different table structures, you need both databases. On the read side, unless you want the user to be able to see old versions of the appointments, you only have to store the current state of the view so just update existing data. On the command side, keeping historical state using a version number is a good idea, but can make db size grow.
how can the main app call for data from the query service if one don't
want to uses REST?
How the request gets to the query side is unimportant, so you can use REST, postback, GraphQL or whatever.
Is that "event sourcing"?
Event Sourcing is when you persist all changes made to all entities. If the entities are small enough you can persist all properties, but in general events only have changes. Then to get current state you add up all those changes to see what your entities look like at a certain point in time. It has nothing to do with the read model – that’s CQRS. Note that events are not the request from the user to make a change, that’s a message which then is used to create a command. An event is a record of all fields that changed as a result of the command. That’s an important distinction because you don’t want to re-run all that business logic when rehydrating an entity or aggregate.
When a client wants to change something but afterwards also show the
changed data the only way I see is to trigger the change and than wait
(let's say with polling) for the query service to have that data.
What's a good way to achieve that? Maybe checking for the existing of
the future version number?
Showing historical data is a bit sticky. I would push back on this requirement if you can, but sometimes it’s necessary. If you must do it, take the standard read model approach and save all changes to a view model table. If the circumstances are right you can cheat and read historical data directly from the domain model tables, but that’s breaking a CQRS rule. This is important because one of the advantages of CQRS is its scalability. You can scale the read side as much as you want if each read instance maintains its own read database, but having to read from the domain model will ruin this. This is situation dependent so you’ll have to decide on your own, but the best course of action is to try to get that requirement removed.
In terms of timing, CQRS is all about eventual consistency. The data changes may not show up on the read side for a while (typically fractions of a second but that's enough to cause problems). If you must show new and old data, you can poll and wait for the proper version number to appear, which is ugly. There are other alternatives involving result queues in Rabbit, but they are even uglier.

What is the differences between Session and Local (client-side only) Collection?

In Meteor, I have a little confusion between Session and Local Collection.
I know that Session is a temporary reactive key-value store, client-side only, and is cleaned on page refresh.
Local collection seems to be the same: reactive, temporary client-side storage, cleaned on page refresh with more flexible function like insert, update & remove query like server-side Mongo collection.
So I guess I could manage everything in Local Collection without Session, or, everything in Session without Local Collection.
But what is the best and efficient way to use Session and/or Local collection?
Simply, when to use Session and not use it?
And when to use Local collection and when not use it?

As I read your question I told myself that this is a very easy question, but then I was scratching my head. I tried to figure out an example that you can just accomplish with session or collections. But I didn't found any use-case. So let's rollup things from begin. Basically you already answered the question on your own, because it is the little sugar that makes collections something special.
When to use a collection?
Basically a collection is a database artifact. Imagine you have a client-server-application. All the data is persisted in the server side storage. Now you can use a local collection to provide the user a small subset of the servers collection. So a client collection is a database with reduced amount of data. The advantage is that you can access the collection with queries. You can use the same queries on server and client. In additon a collection always contains multiple objects of the same type. Sometimes you produce data on client for the client. No server interaction needed. Than you can use a local collection. A local collection provides the same functionality as a normal collection without server communication. This should be used if you have multiple objects with the same structure and in special if you'd like to use query operators.
You can also save the data inside a session object. Session objects can contain multiple objects as well. But imaging you want to find an object in an objectarray indexed with a special id. Than you need to iterate throw the whole array in order to find this object. You have to write additional logic, that can be handled with collection like magic. Further, collections return cursors. A cursor is an reactive object that just changes if the selected data changes. That means if you use find with an id. Than this object just rerenders when the object to this id changes. With session you can't. When a session changes you need to rerender all depending objects.
When to use a session?
For everything else. Sessions are often just small objects that contain some configuration logic. It is basically just one object and not a multiple occurency of equal objects. Haven't time now to go in detail but if it does not fit the collection use-cases you can use sessions.
Have a look at this post that describes why sessions should not be overused.

I assume that by local collection you mean: new Mongo.Collection(null)
The difference is that local collections do not survive hot code pushes. A refresh will erase Session, but hot code push will not, there's special code in Meteor to persist the values of the Session variable in the case of a hot code push..

You would use Session whenever you're storing temporary values that do NOT need to be persisted to the database.
Trivial examples could include a users selection of filters or the item in an index vies that is currently selected.
manipulated data in minimongo (insert, update, delete etc) is intended to be sent back to the server and stored in the database. For example this could be updating a users profile information etc.

couchdb validation based on content from existing documents

QUESTION
Is it possible to query other couchdb documents as part of a standard couchdb validation function ?
If not, what is the standard approach for including properties of other documents as part of a validation rule inside a couchdb validation function?
RATIONALE
Consider a run-of-the-mill address book application where the validation function is intended to prevent two or more entries having the same value for the 'e-mail' in one of the address book entry fields.
Consider also an address book application where it is possible to specify validation rules in separate documents, based on whether the postal code is a US-based postal code or something else.

No, it is not possible to query other couchdb documents in a validate_doc_update function. Each runs in isolation passing references only to: the new document, the old document, and user (where applicable).
My personal experience has been there are at least three options for dealing with duplicate checking:
Use Cloudant as your CouchDB provider. They offer a free tier for now if you'd like to experiment, but they guarantee consistency across nodes for a CouchDB database. (See #2)
I've used a secondary "reserve table" for names using the type-key as the ID. Then, you need to check for conflicts if not using a system like Cloudant. Basically, there's a simple document that maintains a key to prevent duplicates. It's not fun code to write given that you need to watch for conflicts. (Even with cloudant, you need to deal with failed requests to write, but it's easier than dealing with timing issues surrounding data replication across multiple nodes).
Use a traditional DB like MySQL for example that can maintain a unique and consistent index for specific data values like you're describing. Store the documents away in CouchDB though. While slightly annoying that you need different data providers, it's reliable.
(Optional: decide that CouchDB isn't a great fit for the type of system you're building)

What is the easiest way to save a LINQ query for later use?

I have a request for a feature to be able to save a user's search for later.
Right now I'm building LINQ statements on the fly based on what the user has specified.
So I started wondering, is there an easy way for me to simply take the query that the user built, and persist it somewhere, preferably my database, so that I can retrieve it later?
Is there some way of persisting the query as XML or perhaps JSON, and then reconstituting the query later?

Never done this before, but I've had this idea:
Rather than having the query run against your database directly, if you were to have it run against an OData endpoint, you could conceivably extract the URL that is generated as the query string, and save that URL for later use. Since OData has a well-though-out spec already, you would be able to profit from other people's labor.

I'd go with a domain-specific object here even if such goodies did exist -- what happens when you save serialized queries in LINQ and your underlying model changes, invalidating everyone's saved queries. Using your own data format should shield you from this to some extent.

Take a look at the Expression class. This will allow you to pre-compile a query. Although persisting this for later use to the DB for better performance is questionable.

I'm writing this as I watch this presentation at PDC10. Just after the 1-hour mark, he shows how he's built a JSON serializer for expression trees. You might find that interesting.

How to access data in Dynamics CRM?

What is the best way in terms of speed of the platform and maintainability to access data (read only) on Dynamics CRM 4? I've done all three, but interested in the opinions of the crowd.
Via the API
Via the webservices directly
Via DB calls to the views
...and why?
My thoughts normally center around DB calls to the views but I know there are purists out there.

Given both requirements I'd say you want to call the views. Properly crafted SQL queries will fly.
Going through the API is required if you plan to modify data, but it isnt the fastest approach around because it doesnt allow deep loading of entities. For instance if you want to look at customers and their orders you'll have to load both up individually and then join them manually. Where as a SQL query will already have the data joined.
Nevermind that the TDS stream is a lot more effecient that the SOAP messages being used by the API & webservices.
UPDATE
I should point out in regard to the views and CRM database in general: CRM does not optimize the indexes on the tables or views for custom entities (how could it?). So if you have a truckload entity that you lookup by destination all the time you'll need to add an index for that property. Depending upon your application it could make a huge difference in performance.

I'll add to jake's comment by saying that querying against the tables directly instead of the views (*base & *extensionbase) will be even faster.
In order of speed it'd be:
direct table query
view query
filterd view query
api call

Direct table updates:
I disagree with Jake that all updates must go through the API. The correct statement is that going through the API is the only supported way to do updates. There are in fact several instances where directly modifying the tables is the most reasonable option:
One time imports of large volumes of data while the system is not in operation.
Modification of specific fields across large volumes of data.
I agree that this sort of direct modification should only be a last resort when the performance of the API is unacceptable. However, if you want to modify a boolean field on thousands of records, doing a direct SQL update to the table is a great option.
Relative Speed
I agree with XVargas as far as relative speed.
Unfiltered Views vs Tables: I have not found the performance advantage to be worth the hassle of manually joining the base and extension tables.
Unfiltered views vs Filtered views: I recently was working with a complicated query which took about 15 minutes to run using the filtered views. After switching to the unfiltered views this query ran in about 10 seconds. Looking at the respective query plans, the raw query had 8 operations while the query against the filtered views had over 80 operations.
Unfiltered Views vs API: I have never compared querying through the API against querying views, but I have compared the cost of writing data through the API vs inserting directly through SQL. Importing millions of records through the API can take several days, while the same operation using insert statements might take several minutes. I assume the difference isn't as great during reads but it is probably still large.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio