We are designing a Web-based application with Oracle backend for Conference Room reservation. Also - we have decided to use optimistic locking because we expect the number of collisions to be on the lower side.
Now - in case of optimistic locking, there is always a possibility of "Data Already Modified by Somebody Else" scenario.
And our UI involves quite a few fields to be entered. And displaying a message such as "Data has already been modified" is not a pleasant experience for the end-user; especially after he/she has entered say 15+ field.
What I am contemplating is displaying "Collision Probability" when the end user starts his/her UI session based upon first few entries he/she has entered on the UI.
This CP (collision probability) will be dynamically calculated by taking into account the database sessions which are in progress against given table/columns.
For example - if both Person A and Person B are viewing information for Conference Room X - then both will be shown a higher CP (collision probability).
In such case - either of them can decide to wait few seconds. This will be better than re-entering all the data.
Now my question : In Oracle, is there a way to determine which sessions are going against which tables and ROWID ?
"our UI involves quite a few fields to be entered."
This seems like the sort of problem which could be solved by better flow. Minimise the number of fields a user needs to enter before they can get a list of available suitable rooms. When they choose a room, use pessimistic locking to ensure nobody else can snatch the room while they are completing the booking application. Stash a copy of the initial fields so they can re-run the original query if they change their mind.
Of course, this means maintaining a session and handling state, and we all know web applications suck at that. Which is a way of saying that we often use web technologies when they aren't suited to the application we're writing.
I'm trying to understand how ACID in CockroachDB works without locks, from an application programmer's point of view. Would like to use it for an accounting / ERP application.
When two users update the same database field (e.g. a general ledger account total field) at the same time what does CockroachDB do? Assuming each is updating many other non-overlapping fields at the same time as part of the respective transactions.
Will the aborted application's commit process be informed about this immediately at the time of the commit?
Do we need to take care of additional possibilities than, for example, in ACID/locking PostgreSQL when we write the database access code in our application?
Or is writing code for accessing CockroachDB for all practical purposes the same as for accessing a standard RDBMS with respect to commits and in general.
Of course, ignoring performance issues / joins, etc.
I'm trying to understand how ACID in CockroachDB works without locks, from an application programmer's point of view. Would like to use it for an accounting / ERP application.
CockroachDB does have locks, but uses different terminology. Some of the existing documentation that talks about optimistic concurrency control is currently being updated.
When two users update the same database field (e.g. a general ledger account total field) at the same time what does CockroachDB do? Assuming each is updating many other non-overlapping fields at the same time as part of the respective transactions.
One of the transactions will block waiting for the other to commit. If a deadlock between the transactions is detected, one of the two transactions involved in the deadlock will be aborted.
Will the aborted application's commit process be informed about this immediately at the time of the commit?
Yes.
Do we need to take care of additional possibilities than, for example, in ACID/locking PostgreSQL when we write the database access code in our application?
Or is writing code for accessing CockroachDB for all practical purposes the same as for accessing a standard RDBMS with respect to commits and in general.
At a high-level there is nothing additional for you to do. CockroachDB defaults to serializable isolation which can result in more transaction restarts that weaker isolation levels, but comes with the advantage that the application programmer doesn't have to worry about anomalies.
I have to implement a database solution wherein contention is handled in a clustered environment. There is a scenario wherein there are multiple users trying to access a bank account at the same time and deposit money into it if balance is less than $100, how can I make sure that no extra money is deposited? Basically , this query is supposed to fire :-
update acct set balance=balance+25 where acct_no=x ;
Since database is clustered , account ends up getting deposited multiple times.
I am looking for purely oracle based solution.
Clustering doesn't matter for the system which is trying to prevent the scenario you're fearing/seeing, which is locking.
Behold scenario user A and then user B trying to do an update, based on a check (less than 100 dollar in account):
If both the check and the update is done in the same transaction, locking will prevent that user B does a check, UNTIL user A has done both the check, and the actual insert. In other words, user B will find the check failing, and will not perform the asked action.
When a user says "at the same time", you should know that the computer does not know that concept, as all transactions are sequential, no matter what millisecond is identical. Behold the ID that is kept in the Redo Logs, there's only one counter. Transaction X and Y is done before or after each other, never at the same time.
That doesn't sound right ... When Oracle locks a row for update, the lock should be across all nodes. What you describe doesn't sound right. What version of Oracle are you using, and can you provide a step-by-step example of what you're doing?
Oracle 11 doc here:
http://docs.oracle.com/cd/B28359_01/server.111/b28318/consist.htm#CNCPT020
In concurrency, in optimistic concurrency the way to control the concurrency is using a timestamp field. However, in my particular case, not all the fields need to be controlled in respect to concurrency.
For example, I have a products table, holding the amount of stock. This table has fields like description, code... etc. For me, it is not a problem that one user modifies these fields, but I have to control if some other user changes the stock.
So if I use a timestamp and one user changes the description and another changes the amount of stock, the second user will get an exception.
However, if I use the field stock instead of concurrency exception, then the first user can update the information and the second can update the stock without problems.
Is it a good solution to use the stock field to control concucrrency or is it better to always use a timestamp field?
And if in the future I need to add a new important field, then I need to use two fields to control concurrency for stock and the new one? Does it have a high cost in terms of performance?
Consider the definition of optimistic concurrency:
In the field of relational database management systems, optimistic concurrency control (OCC) is a concurrency control method that assumes that multiple transactions can complete without affecting each other, and that therefore transactions can proceed without locking the data resources that they affect. (Wikipedia)
Clearly this definition is abstract and leaves a lot of room for your specific implementation.
Let me give you an example. A few years back I evaluated the same thing with a bunch of colleagues and we realized that in our application, on some of the tables, it was okay for the concurrency to simply be based on the fields the user was updating.
So, in other words, as long as the fields they were updating hadn't changed since they gathered the row, we'd let them update the row because the rest of the fields really didn't matter and and row was going to get refreshed on udpate anyway so they would get the most recent changes by other users.
So, in short, I would say what you're doing is just fine and there aren't really any hard and fast rules. It really depends on what you need. If you need it to be more flexible, like what you're talking about, then make it more flexible -- simple.
I have a feeling that there must be client-server synchronization patterns out there. But i totally failed to google up one.
Situation is quite simple - server is the central node, that multiple clients connect to and manipulate same data. Data can be split in atoms, in case of conflict, whatever is on server, has priority (to avoid getting user into conflict solving). Partial synchronization is preferred due to potentially large amounts of data.
Are there any patterns / good practices for such situation, or if you don't know of any - what would be your approach?
Below is how i now think to solve it:
Parallel to data, a modification journal will be held, having all transactions timestamped.
When client connects, it receives all changes since last check, in consolidated form (server goes through lists and removes additions that are followed by deletions, merges updates for each atom, etc.).
Et voila, we are up to date.
Alternative would be keeping modification date for each record, and instead of performing data deletes, just mark them as deleted.
Any thoughts?
You should look at how distributed change management works. Look at SVN, CVS and other repositories that manage deltas work.
You have several use cases.
Synchronize changes. Your change-log (or delta history) approach looks good for this. Clients send their deltas to the server; server consolidates and distributes the deltas to the clients. This is the typical case. Databases call this "transaction replication".
Client has lost synchronization. Either through a backup/restore or because of a bug. In this case, the client needs to get the current state from the server without going through the deltas. This is a copy from master to detail, deltas and performance be damned. It's a one-time thing; the client is broken; don't try to optimize this, just implement a reliable copy.
Client is suspicious. In this case, you need to compare client against server to determine if the client is up-to-date and needs any deltas.
You should follow the database (and SVN) design pattern of sequentially numbering every change. That way a client can make a trivial request ("What revision should I have?") before attempting to synchronize. And even then, the query ("All deltas since 2149") is delightfully simple for the client and server to process.
As part of the team, I did quite a lot of projects which involved data syncing, so I should be competent to answer this question.
Data syncing is quite a broad concept and there are way too much to discuss. It covers a range of different approaches with their upsides and downsides. Here is one of the possible classifications based on two perspectives: Synchronous / Asynchronous, Client/Server / Peer-to-Peer. Syncing implementation is severely dependent on these factors, data model complexity, amount of data transferred and stored, and other requirements. So in each particular case the choice should be in favor of the simplest implementation meeting the app requirements.
Based on a review of existing off-the-shelf solutions, we can delineate several major classes of syncing, different in granularity of objects subject to synchronization:
Syncing of a whole document or database is used in cloud-based applications, such as Dropbox, Google Drive or Yandex.Disk. When the user edits and saves a file, the new file version is uploaded to the cloud completely, overwriting the earlier copy. In case of a conflict, both file versions are saved so that the user can choose which version is more relevant.
Syncing of key-value pairs can be used in apps with a simple data structure, where the variables are considered to be atomic, i.e. not divided into logical components. This option is similar to syncing of whole documents, as both the value and the document can be overwritten completely. However, from a user perspective a document is a complex object composed of many parts, but a key-value pair is but a short string or a number. Therefore, in this case we can use a more simple strategy of conflict resolution, considering the value more relevant, if it has been the last to change.
Syncing of data structured as a tree or a graph is used in more sophisticated applications where the amount of data is large enough to send the database in its entirety at every update. In this case, conflicts have to be resolved at the level of individual objects, fields or relationships. We are primarily focused on this option.
So, we grabbed our knowledge into this article which I think might be very useful to everyone interested in the topic => Data Syncing in Core Data Based iOS apps (http://blog.denivip.ru/index.php/2014/04/data-syncing-in-core-data-based-ios-apps/?lang=en)
What you really need is Operational Transform (OT). This can even cater for the conflicts in many cases.
This is still an active area of research, but there are implementations of various OT algorithms around. I've been involved in such research for a number of years now, so let me know if this route interests you and I'll be happy to put you on to relevant resources.
The question is not crystal clear, but I'd look into optimistic locking if I were you.
It can be implemented with a sequence number that the server returns for each record. When a client tries to save the record back, it will include the sequence number it received from the server. If the sequence number matches what's in the database at the time when the update is received, the update is allowed and the sequence number is incremented. If the sequence numbers don't match, the update is disallowed.
I built a system like this for an app about 8 years ago, and I can share a couple ways it has evolved as the app usage has grown.
I started by logging every change (insert, update or delete) from any device into a "history" table. So if, for example, someone changes their phone number in the "contact" table, the system will edit the contact.phone field, and also add a history record with action=update, table=contact, field=phone, record=[contact ID], value=[new phone number]. Then whenever a device syncs, it downloads the history items since the last sync and applies them to its local database. This sounds like the "transaction replication" pattern described above.
One issue is keeping IDs unique when items could be created on different devices. I didn't know about UUIDs when I started this, so I used auto-incrementing IDs and wrote some convoluted code that runs on the central server to check new IDs uploaded from devices, change them to a unique ID if there's a conflict, and tell the source device to change the ID in its local database. Just changing the IDs of new records wasn't that bad, but if I create, for example, a new item in the contact table, then create a new related item in the event table, now I have foreign keys that I also need to check and update.
Eventually I learned that UUIDs could avoid this, but by then my database was getting pretty large and I was afraid a full UUID implementation would create a performance issue. So instead of using full UUIDs, I started using randomly generated, 8 character alphanumeric keys as IDs, and I left my existing code in place to handle conflicts. Somewhere between my current 8-character keys and the 36 characters of a UUID there must be a sweet spot that would eliminate conflicts without unnecessary bloat, but since I already have the conflict resolution code, it hasn't been a priority to experiment with that.
The next problem was that the history table was about 10 times larger than the entire rest of the database. This makes storage expensive, and any maintenance on the history table can be painful. Keeping that entire table allows users to roll back any previous change, but that started to feel like overkill. So I added a routine to the sync process where if the history item that a device last downloaded no longer exists in the history table, the server doesn't give it the recent history items, but instead gives it a file containing all the data for that account. Then I added a cronjob to delete history items older than 90 days. This means users can still roll back changes less than 90 days old, and if they sync at least once every 90 days, the updates will be incremental as before. But if they wait longer than 90 days, the app will replace the entire database.
That change reduced the size of the history table by almost 90%, so now maintaining the history table only makes the database twice as large instead of ten times as large. Another benefit of this system is that syncing could still work without the history table if needed -- like if I needed to do some maintenance that took it offline temporarily. Or I could offer different rollback time periods for accounts at different price points. And if there are more than 90 days of changes to download, the complete file is usually more efficient than the incremental format.
If I were starting over today, I'd skip the ID conflict checking and just aim for a key length that's sufficient to eliminate conflicts, with some kind of error checking just in case. (It looks like YouTube uses 11-character random IDs.) The history table and the combination of incremental downloads for recent updates or a full download when needed has been working well.
For delta (change) sync, you can use pubsub pattern to publish changes back to all subscribed clients, services like pusher can do this.
For database mirror, some web frameworks use a local mini database to sync server side database to local in browser database, partial synchronization is supported. Check meteror.
This page clearly describes mosts scenarios of data synchronization with patterns and example code: Data Synchronization: Patterns, Tools, & Techniques
It is the most comprehensive source I found, considering whole of delta syncs, strategies on how to handle deletions and server-to-client and client-to-server sync. It is a very good starting point, worth a look.