Recently I'm looking forward to implement the CQRS / ES pattern with Event sourcing in my microservices.
I've been reading for these patterns, but I have some questions that I couldn't find an answer anywhere:
When doing CQRS / ES, should each microservice have its own local
database anymore (Within microservice)?
I know that there will be an event store for writes, and a read-only projection database and i totally understand their purpose, but do microservices need
their own local database for any reason? (Advantages / disadvantages)
Example: Order microservice could have local orders database, item service an items local database etc...apart from the Event source DB and projections database implemented.
How to validate if some data exists in a microservice before
actually issuing a command?
Let's say i want to make a new order, so i assume first I have to
check if that item is still in stock, then perform the other
operation/s.
However, if i want to check if an item is still in stock, where do i
query that data, will it be the projection (read-only) database, or
a local database that each microservice has?
I've read many articles about CQRS / ES at this point, but most of them just explain the concept rather than actually diving into real-life scenarios / explaining how to implement it. I would appreciate if you had any recommendations.
Much appreciated
In general, when dealing with microservices, it's recommended (regardless of whether or not you're doing CQRS/ES) that no two microservices use the same database, or at the very least that no two microservices be writing to the same database. This allows each microservice to control its schema, which only needs to change if the microservice needs it to. One other advantage of this is that the database becomes entirely encapsulated within the service: it's purely an implementation detail.
It's entirely possible that a microservice implementing a read-model might not have a database: it might be able to keep all state in memory (an example might be a read-model which exposes metrics for your monitoring infrastructure), or it might simply be translating events from the write-model into commands to another service (so all of its state is just its position in the event stream).
if i want to check if an item is still in stock, where do i query that data, will it be the projection (read-only) database, or a local database that each microservice has?
In an event-sourced system, every view that's not the stream of events is a projection. So, depending on your requirements, your service can query another service or maintain its own view based on the events.
Note that at any given instant there may exist an event which has been published to the event stream (i.e. it has indisputably happened) but for which there also exists a projection which has not processed the event: the projections are eventually consistent with the event stream. So any check of whether an item is in stock will only tell you that the item was in stock at some point in the past (never mind, to use Greg Young's example, that no in-stock data can guarantee that nothing's been stolen from the warehouse unless the thieves happened to have the decency to update the count as they walked out with their loot). The nanosecond after your query, it might receive word of an event which makes it out-of-stock before you placed your order.
Accordingly, it may just be worth sending a command and letting it get reject your order if the item is not in stock. The write-side (which is the more strongly consistent part of the system, though it should be remembered that in many cases, one component's events are another component's commands) is under no obligation to accept every command; "command" in this context really means "polite request to publish events to the event stream which are conformant with my desired state of the universe".
I'm endeavoring to develop an application that uses Oracle as the database back-end. The application will calculate several statistics from the various tables in the database. The front-end will most likely be a web application and this front-end will display various charts and calculated statistics. Now, I imagine that it would be more efficient to perform the calculations in the database rather than in the service layer because said calculations would need to be performed for every web request. That being the case, I'm not sure which mechanism to use. (e.g. stored procedure, function, view) To illustrate what I'm going for, suppose I want to keep statistics of student grades for many students. I would like to have a web interface that lets me view those statistics on student-by-student basis and also an all-inclusive basis. Some of the stats are dependent on aggregates (e.g. average, min, max) of all of the student grades and some stats are dependent only on an individual student. In this situation, every time a record is added or updated, the aggregates would have to be recalculated. So I am speculating that if I had a special table that held all of the calculated values I need and a trigger(s) to recalculate everything when a record is added/updated then all I would need to do from a web request point-of-view is have the service layer pull the desired values from this special table. I'm just not sure if this is the best way to go or not so I am asking the community for any input/advice. Note: Although I'm using Oracle, I'm open to using PostgreSQL or mySQL.
Thanks in advance
The scenario you are describing would be ideal for using materialized views. They can be designed to refresh automatically (and incrementally) every time the source data is updated by your application. The calculations would be built in to the view definition. No triggers required, and likely no stored procedures unless your calculations involve multiple steps. Check here: https://oracle-base.com/articles/misc/materialized-views and here: https://medium.com/oracledevs/lightning-fast-sql-with-real-time-materialized-views-12-things-developers-will-love-about-oracle-54bcc9eac358 for more info.
I am building an app, for example, todoapp.
Features:
Offline-first (couchdb + pouchdb)
Multi-user
CouchDB options: require_valid_user, couch_peruser.
Ok, each user has a private database. But how can I validate docs on post/put?
design/validate_doc_update must be in every user db (userdb-{hex}).
How can I place it there and update it sometimes all at once? Sync? Third db/replicate? How can I replicate to all userdbs?
You have three basic options, which are all quite similar:
Set up continuous replications between a prototype database and each user's database. For large numbers of users, and low frequency of updates, this could amount to a lot of overhead for little gain.
Trigger a one-off replication between a prototype database and each user's database, every time an update occurs. This requires you to know when an update occurs in your application, and to manually handle any replication failures that may occur. This overhead may be pretty small in simple scenarios.
Have your application update the design docs every time they change, for each user database. This is sort of a manual sync option.
Which option you choose is really up to you, and is a matter of trade-offs.
If you would ever desire the option to update only a subset of users (say a beta testing group), then option 2 or 3 are going to be your best bets.
We are designing a Web-based application with Oracle backend for Conference Room reservation. Also - we have decided to use optimistic locking because we expect the number of collisions to be on the lower side.
Now - in case of optimistic locking, there is always a possibility of "Data Already Modified by Somebody Else" scenario.
And our UI involves quite a few fields to be entered. And displaying a message such as "Data has already been modified" is not a pleasant experience for the end-user; especially after he/she has entered say 15+ field.
What I am contemplating is displaying "Collision Probability" when the end user starts his/her UI session based upon first few entries he/she has entered on the UI.
This CP (collision probability) will be dynamically calculated by taking into account the database sessions which are in progress against given table/columns.
For example - if both Person A and Person B are viewing information for Conference Room X - then both will be shown a higher CP (collision probability).
In such case - either of them can decide to wait few seconds. This will be better than re-entering all the data.
Now my question : In Oracle, is there a way to determine which sessions are going against which tables and ROWID ?
"our UI involves quite a few fields to be entered."
This seems like the sort of problem which could be solved by better flow. Minimise the number of fields a user needs to enter before they can get a list of available suitable rooms. When they choose a room, use pessimistic locking to ensure nobody else can snatch the room while they are completing the booking application. Stash a copy of the initial fields so they can re-run the original query if they change their mind.
Of course, this means maintaining a session and handling state, and we all know web applications suck at that. Which is a way of saying that we often use web technologies when they aren't suited to the application we're writing.
I have a feeling that there must be client-server synchronization patterns out there. But i totally failed to google up one.
Situation is quite simple - server is the central node, that multiple clients connect to and manipulate same data. Data can be split in atoms, in case of conflict, whatever is on server, has priority (to avoid getting user into conflict solving). Partial synchronization is preferred due to potentially large amounts of data.
Are there any patterns / good practices for such situation, or if you don't know of any - what would be your approach?
Below is how i now think to solve it:
Parallel to data, a modification journal will be held, having all transactions timestamped.
When client connects, it receives all changes since last check, in consolidated form (server goes through lists and removes additions that are followed by deletions, merges updates for each atom, etc.).
Et voila, we are up to date.
Alternative would be keeping modification date for each record, and instead of performing data deletes, just mark them as deleted.
Any thoughts?
You should look at how distributed change management works. Look at SVN, CVS and other repositories that manage deltas work.
You have several use cases.
Synchronize changes. Your change-log (or delta history) approach looks good for this. Clients send their deltas to the server; server consolidates and distributes the deltas to the clients. This is the typical case. Databases call this "transaction replication".
Client has lost synchronization. Either through a backup/restore or because of a bug. In this case, the client needs to get the current state from the server without going through the deltas. This is a copy from master to detail, deltas and performance be damned. It's a one-time thing; the client is broken; don't try to optimize this, just implement a reliable copy.
Client is suspicious. In this case, you need to compare client against server to determine if the client is up-to-date and needs any deltas.
You should follow the database (and SVN) design pattern of sequentially numbering every change. That way a client can make a trivial request ("What revision should I have?") before attempting to synchronize. And even then, the query ("All deltas since 2149") is delightfully simple for the client and server to process.
As part of the team, I did quite a lot of projects which involved data syncing, so I should be competent to answer this question.
Data syncing is quite a broad concept and there are way too much to discuss. It covers a range of different approaches with their upsides and downsides. Here is one of the possible classifications based on two perspectives: Synchronous / Asynchronous, Client/Server / Peer-to-Peer. Syncing implementation is severely dependent on these factors, data model complexity, amount of data transferred and stored, and other requirements. So in each particular case the choice should be in favor of the simplest implementation meeting the app requirements.
Based on a review of existing off-the-shelf solutions, we can delineate several major classes of syncing, different in granularity of objects subject to synchronization:
Syncing of a whole document or database is used in cloud-based applications, such as Dropbox, Google Drive or Yandex.Disk. When the user edits and saves a file, the new file version is uploaded to the cloud completely, overwriting the earlier copy. In case of a conflict, both file versions are saved so that the user can choose which version is more relevant.
Syncing of key-value pairs can be used in apps with a simple data structure, where the variables are considered to be atomic, i.e. not divided into logical components. This option is similar to syncing of whole documents, as both the value and the document can be overwritten completely. However, from a user perspective a document is a complex object composed of many parts, but a key-value pair is but a short string or a number. Therefore, in this case we can use a more simple strategy of conflict resolution, considering the value more relevant, if it has been the last to change.
Syncing of data structured as a tree or a graph is used in more sophisticated applications where the amount of data is large enough to send the database in its entirety at every update. In this case, conflicts have to be resolved at the level of individual objects, fields or relationships. We are primarily focused on this option.
So, we grabbed our knowledge into this article which I think might be very useful to everyone interested in the topic => Data Syncing in Core Data Based iOS apps (http://blog.denivip.ru/index.php/2014/04/data-syncing-in-core-data-based-ios-apps/?lang=en)
What you really need is Operational Transform (OT). This can even cater for the conflicts in many cases.
This is still an active area of research, but there are implementations of various OT algorithms around. I've been involved in such research for a number of years now, so let me know if this route interests you and I'll be happy to put you on to relevant resources.
The question is not crystal clear, but I'd look into optimistic locking if I were you.
It can be implemented with a sequence number that the server returns for each record. When a client tries to save the record back, it will include the sequence number it received from the server. If the sequence number matches what's in the database at the time when the update is received, the update is allowed and the sequence number is incremented. If the sequence numbers don't match, the update is disallowed.
I built a system like this for an app about 8 years ago, and I can share a couple ways it has evolved as the app usage has grown.
I started by logging every change (insert, update or delete) from any device into a "history" table. So if, for example, someone changes their phone number in the "contact" table, the system will edit the contact.phone field, and also add a history record with action=update, table=contact, field=phone, record=[contact ID], value=[new phone number]. Then whenever a device syncs, it downloads the history items since the last sync and applies them to its local database. This sounds like the "transaction replication" pattern described above.
One issue is keeping IDs unique when items could be created on different devices. I didn't know about UUIDs when I started this, so I used auto-incrementing IDs and wrote some convoluted code that runs on the central server to check new IDs uploaded from devices, change them to a unique ID if there's a conflict, and tell the source device to change the ID in its local database. Just changing the IDs of new records wasn't that bad, but if I create, for example, a new item in the contact table, then create a new related item in the event table, now I have foreign keys that I also need to check and update.
Eventually I learned that UUIDs could avoid this, but by then my database was getting pretty large and I was afraid a full UUID implementation would create a performance issue. So instead of using full UUIDs, I started using randomly generated, 8 character alphanumeric keys as IDs, and I left my existing code in place to handle conflicts. Somewhere between my current 8-character keys and the 36 characters of a UUID there must be a sweet spot that would eliminate conflicts without unnecessary bloat, but since I already have the conflict resolution code, it hasn't been a priority to experiment with that.
The next problem was that the history table was about 10 times larger than the entire rest of the database. This makes storage expensive, and any maintenance on the history table can be painful. Keeping that entire table allows users to roll back any previous change, but that started to feel like overkill. So I added a routine to the sync process where if the history item that a device last downloaded no longer exists in the history table, the server doesn't give it the recent history items, but instead gives it a file containing all the data for that account. Then I added a cronjob to delete history items older than 90 days. This means users can still roll back changes less than 90 days old, and if they sync at least once every 90 days, the updates will be incremental as before. But if they wait longer than 90 days, the app will replace the entire database.
That change reduced the size of the history table by almost 90%, so now maintaining the history table only makes the database twice as large instead of ten times as large. Another benefit of this system is that syncing could still work without the history table if needed -- like if I needed to do some maintenance that took it offline temporarily. Or I could offer different rollback time periods for accounts at different price points. And if there are more than 90 days of changes to download, the complete file is usually more efficient than the incremental format.
If I were starting over today, I'd skip the ID conflict checking and just aim for a key length that's sufficient to eliminate conflicts, with some kind of error checking just in case. (It looks like YouTube uses 11-character random IDs.) The history table and the combination of incremental downloads for recent updates or a full download when needed has been working well.
For delta (change) sync, you can use pubsub pattern to publish changes back to all subscribed clients, services like pusher can do this.
For database mirror, some web frameworks use a local mini database to sync server side database to local in browser database, partial synchronization is supported. Check meteror.
This page clearly describes mosts scenarios of data synchronization with patterns and example code: Data Synchronization: Patterns, Tools, & Techniques
It is the most comprehensive source I found, considering whole of delta syncs, strategies on how to handle deletions and server-to-client and client-to-server sync. It is a very good starting point, worth a look.