Tables being affected by current transactions in Oracle - oracle

We are designing a Web-based application with Oracle backend for Conference Room reservation. Also - we have decided to use optimistic locking because we expect the number of collisions to be on the lower side.
Now - in case of optimistic locking, there is always a possibility of "Data Already Modified by Somebody Else" scenario.
And our UI involves quite a few fields to be entered. And displaying a message such as "Data has already been modified" is not a pleasant experience for the end-user; especially after he/she has entered say 15+ field.
What I am contemplating is displaying "Collision Probability" when the end user starts his/her UI session based upon first few entries he/she has entered on the UI.
This CP (collision probability) will be dynamically calculated by taking into account the database sessions which are in progress against given table/columns.
For example - if both Person A and Person B are viewing information for Conference Room X - then both will be shown a higher CP (collision probability).
In such case - either of them can decide to wait few seconds. This will be better than re-entering all the data.
Now my question : In Oracle, is there a way to determine which sessions are going against which tables and ROWID ?

"our UI involves quite a few fields to be entered."
This seems like the sort of problem which could be solved by better flow. Minimise the number of fields a user needs to enter before they can get a list of available suitable rooms. When they choose a room, use pessimistic locking to ensure nobody else can snatch the room while they are completing the booking application. Stash a copy of the initial fields so they can re-run the original query if they change their mind.
Of course, this means maintaining a session and handling state, and we all know web applications suck at that. Which is a way of saying that we often use web technologies when they aren't suited to the application we're writing.

Related

Event Sourcing and concurrent, contradictory events creation

I am having a hard time figuring this one out. Maybe you can help me.
Problem statement:
Imagine there is a system that records financial transactions of an account (like a wallet service). Transactions are stored in a database and each Transaction denotes an increase or decrease of the balance of a given amount.
On the application code side, when the User wants to purchase, all Transactions for his account are being pulled from the DB and the current balance is calculated. Based on the result, the customer has or has not sufficient funds for the purchase (the balance can never go below zero).
Transactions example:
ID userId amount currency, otherData
Transaction(12345, 54321, 180, USD, ...)
Transaction(12346, 54321, -50, USD, ...)
Transaction(12347, 54321, 20, USD, ...)
Those 3 from above would mean the User has 150 USD on his balance.
Concurrent access:
Now, imagine there are 2 or more instances of such application. Imagine, the User has a balance of 100 USD and bought two items worth of 100 USD at the same time. Request for such a purchase goes to two different instances, which both read all Transactions from DB and reduce them into currentBalance. In both replicas, at the same time balance equals to 100 USD. Both services allow purchase and add new Transaction Transaction(12345, 54321, -100, USD, ...) which decreases the balance by 100.
If there are two, contradictory Transactions inserted into the DB, the balance is incorrect: -100 USD.
Question:
How should I deal with such a situation?
I know that usually optimistic or pessimistic concurrency control is used. So here are my doubts about both:
Optimistic concurrency
It's about keeping the version of the resource and comparing it before the actual update, like a CAS operation. Since Transactions are a form of events - immutable entities - there is no resource which version I could grasp. I do not update anything. I only insert new changes to the balance, which has to be consistent with all other existing Transactions.
Pessimistic concurrency
It's about locking the table/page/row for modification, in case they more often happen in the system. Yeah, ok.. blocking a table/page for each insert is off the table I think (scalability and high load concerns). And locking rows - well, which rows do I lock? Again, I do not modify anything in the DB state.
Open ideas
My feeling is, that this kind of problem has to be solved on the application code level. Some, yet vague ideas that come to my mind now:
Distributed cache, which holds "lock of given User", so that only one Transaction can be processed at a time (purchase, deposit, withdrawal, refund, anything).
Each Transaction has having field such as previousTransactionId - pointer to the last committed Transaction and some kind of unique index on this field (exactly one Transaction can point to exactly one Transaction in the past, first Transaction ever having null value). This way I'd get constraint violation error trying to insert a duplicate.
Asynchronous processing with queueing system, and having a topic-per-user: exactly one instance processing Transactions for given User one-by-one. Nice try, but unfortunatelly I need to be synchronous with the purchase in order to reply to 3rd party system.
One thing to note is that typically there's a per-entity offset (a monotonically increasing number, e.g. Account|12345|6789 could be the 6789th event for account #12345) associated with each event. Thus, assuming the DB in which you're storing events supports it, you can get optimistic concurrency control by remembering the highest offset seen when reconstructing the state of that entity and conditioning the insertion of events on there not being events for account #12345 with offsets greater than 6789.
There are datastores which support the idea of "fencing": only one instance is allowed to publish events to a particular stream, which is another way to optimistic concurrency control.
There are approaches which move pessimistic concurrency control into the application/framework/toolkit code. Akka/Akka.Net (disclaimer: I am employed by Lightbend, which maintains and sells commercial support for one of those two projects) has cluster sharding, which allows multiple instances of an application to coordinate ownership of entities between themselves. For example instance A might have account 12345 and instance B might have account 23456. If instance B receives a request for account 12345, it (massively simplifying) effectively forwards the request to instance A which enforces that only request for account 12345 is being processed at a time. This approach can in some way be thought of as a combination of 1 (of note: this distributed cache is not only providing concurrency control, but actually caching the application state (e.g. the account balance and any other data useful for deciding if a transaction can be accepted) too) and 3 (even though it's presenting a synchronous API to the outside world).
Additionally, it is often possible to design the events such that they form a conflict-free replicated data type (CRDT) which effectively allows forks in the event log as long as there's a guarantee that they can be reconciled. One could squint and perhaps see bank accounts allowing overdrafts (where the reconciliation is allowing a negative balance and charging a substantial fee) as an example of a CRDT.
How should I deal with such a situation?
The general term for the problem you are describing is set validation. If there is some property that must hold for the set taken as a whole, then you need to have some form of lock to prevent conflicting writes.
Optimistic/pessimistic are just two different locking implementations.
In the event that you have concurrent writes, the usual general mechanism is that first writer wins. The losers of the race follow the "concurrent modification" branch, and either retry (recalculating again to ensure that the desired properties still hold) or abort.
In a case like you describe, if your insertion code is responsible for confirming that the user balance is not negative, then that code needs to be able to lock the entire transaction history for the user.
Now: notice that if in the previous paragraph, because its really important. One of the things you need to understand in your domain is whether or not your system is the authority for transactions.
If your system is the authority, then maintaining the invariant is reasonable, because your system can say "no, that one isn't a permitted transaction", and everyone else has to go along with it.
If your system is NOT the authority - you are getting copies of transactions from "somewhere else", then your system doesn't have veto power, and shouldn't be trying to skip transactions just because the balance doesn't work out.
So we might need a concept like "overdrawn" in our system, rather than trying to state absolutely that balance will always satisfy some invariant.
Fundamentally, collaborative/competitive domains with lots of authorities working in parallel require a different understanding of properties and constraints than the simpler models we can use with a single authority.
In terms of implementation, the usual approach is that the set has a data representation that can be locked as a whole. One common approach is to keep an append only list of changes to the set (sometimes referred to has the set's history or "event stream").
In relational databases, one successful approach I've seen is to implement a stored procedure that takes the necessary arguments and then acquires the appropriate locks (ie - applying "tell, don't ask" to the relational data store); that allows you to insulate the application code from the details of the data store.

Alert users that their records are out-of-date, as they become out of date?

I have a construction_events table in an Oracle database. Users enter construction events into the table via GIS software.
For example, a user can enter a construction project for 2019.
The event_status would be entered as proposed.
Starts out as legit:
The record is legitimate at the time that it is entered. The user proposes a project for 2019 which makes logical sense.
Becomes out-of-date:
However, as time passes, and we reach say...2020, logically the 2019 project should have been changed to complete, deferred, cancelled, etc..
Unfortunately though, that rarely happens. Users often fail to change the status of events (despite my reminders for them to check). This results in records where the year was in the past (2019), but the status suggests the event is for the future (proposed). This is logically impossible (an event can't be in the past --and-- simultaneously in the future). So, we have a problem.
Question:
Often in databases, we can prevent wrong data from being entered in the first place (check constraints, no nulls, triggers, etc.). However, in this case, the record was, in fact, correct at the time it was entered, so the aforementioned QC measures aren't applicable.
How can I manage this situation so that users are alerted that their records are out-of-date, as they become out of date?
Note: I'm not an I.T. guy or a developer. I'm just a public works data analyst. This might seem like seem like a silly question with an obvious answer, so feel free to provide negative feedback.

How oracle handles concurrency in clustered environment?

I have to implement a database solution wherein contention is handled in a clustered environment. There is a scenario wherein there are multiple users trying to access a bank account at the same time and deposit money into it if balance is less than $100, how can I make sure that no extra money is deposited? Basically , this query is supposed to fire :-
update acct set balance=balance+25 where acct_no=x ;
Since database is clustered , account ends up getting deposited multiple times.
I am looking for purely oracle based solution.
Clustering doesn't matter for the system which is trying to prevent the scenario you're fearing/seeing, which is locking.
Behold scenario user A and then user B trying to do an update, based on a check (less than 100 dollar in account):
If both the check and the update is done in the same transaction, locking will prevent that user B does a check, UNTIL user A has done both the check, and the actual insert. In other words, user B will find the check failing, and will not perform the asked action.
When a user says "at the same time", you should know that the computer does not know that concept, as all transactions are sequential, no matter what millisecond is identical. Behold the ID that is kept in the Redo Logs, there's only one counter. Transaction X and Y is done before or after each other, never at the same time.
That doesn't sound right ... When Oracle locks a row for update, the lock should be across all nodes. What you describe doesn't sound right. What version of Oracle are you using, and can you provide a step-by-step example of what you're doing?
Oracle 11 doc here:
http://docs.oracle.com/cd/B28359_01/server.111/b28318/consist.htm#CNCPT020

Username uniqueness validation - Design Approach

This is a general design problem - I want to validate a username field for uniqueness when the user enters the value and tabs out. I do a Ajax validation and get a response from the server. This is all very standard. Now, what if I have a HUGE user database ? How to handle this situation ? I want to find if a username "foozbarz" is present among 150Million usernames ?
Database queries are out of question [EDIT] - Read the username database once and populate the cache/hash for faster lookup (to clarify Emil Vikström's point)
In memory databases wont help either
Keep an in-memory hash (or cache/memcache) to store all usernames - usernames can be easily hashed and lookup will be very fast. But there are some problems with this:
a. Size of the hash - can we optimize so that we can reduce the hash size ?
b. Hash/cache refresh frequencies (users might get added while we are validating)
Shard the username table based on some criteria (e.g.: A-B in table username_1 and so on) - thanks piotrek for this suggestion
Or, any other better approach ?
why don't you simply partition the data? if you have/plan to have 150M+ users i assume you have/will have budget for this. if you are just starting (with 2k users) do it traditional way with simple indexed search on database. when you have so many users that you observe performance issues and measure that this is because of your database (and not e.g. www server) then you simply put another database. on the first one you will have users with name from a to m and rest on the other one. you may choose other criterion, like hash, to make data be balanced. when you need more you will add more databases. but if you don't have so many users right now, i advise you not to do any premature optimizations. there are many things that may become a bottleneck with this amount of data
You are most likely right about doing some kind of hashing where you store the taken names and, obviously, not hashed means it's free.
What you shouldn't do is rely on that validation. There can be a lot of time between user pressing Register and user checking if name is free.
To be fair, you only have one issue here and that's consideration for whether you REALLY need to worry whether you will get 150 million users. Scalability is often an issue, but unless this happens over night, you can probably swap in a better solution before this happens.
Secondly, your worry about both users getting a THIS NAME IS FREE and then one taking it. First of all, the chances of that happening are pretty damn low. Secondly, the only ways I can think of ‘solving’ this in a way where user will never click OK with validated name and get a USERNAME TAKEN is to either
a) Remember what user validated last, store that, and if someone else registers that in a mean time, use AJAX to change the name field to taken and notify the user. Don't do this. A lot of wasted cycles and really too much effort to implement.
b) Lock usernames as user validates one, for a short period of time. This results in a lot of free usernames coming up as taken when they actually aren't. You probably don't want this either.
The easiest solution for this is to simply put hash things into the table as users actually click OK, but before doing that, check if the name exists again. If it does, just send the user back with USERNAME TAKEN. The chances of someone racing someone else for a name are really, really slim and I doubt anyone will make a big fuss over how your validator (which did its job, the name was free at the point of checking) ‘lied’ to the user.
Basically your only issue is how you want to store the nicknames.
Your #1 criteria is flawed because this is exactly what you have a database system for: to store and manage data. Why do you even have a table with usernames if you're not going to read it?
The first thing to do is improving the database system by adding an index, preferably a HASH index if your database system supports it. You will have a hard time writing anything near the performance of this yourself.
If this is not enough, you must start scaling your database, for example by building a clustered database or by partitioning the table into multiple sub-tables.
What I think is a fair thing to do is implement caching in front of the database, but for single names. Not all usernames will have a collision attempt, so you may cache the small subset where the collisions typically happen. A simple algorithm for checking the collision status of USER:
Check if USER exist in your cache. If it does:
Set a "last checked" timestamp for USER inside the cache
You are done and USER is a collision
Check the database for USER. If it does exist:
Add USER to the cache
If the cache is full (all X slots is used), remove the least recently used username from the cache (or the Y least recently used usernames, if you want to minimize cache pruning).
You are done and USER is a collision
If it didn't match the cache or the db, you are done and USER is NOT a collision.
You will of course still need a UNIQUE contraint in your database to avoid race conditions.
If you're going the traditional route you could use an appropriate index to improve the database lookup.
You could also try using something like ElasticSearch which has very low latency lookups on large data sets.
If you have 150M+ users, you will have to have in place some function that:
Checks that the user exists, and signals if not found
Verifies the password is correct, and signals if it is not
Retrieves the user's data
This problem you will have, and will have to solve it. In all likelihood with something akin to a user's query. Even if you heavily rely on sessions, still you will have the problem of "finding session X among many from a 150M+ pool", which is structurally identical to "finding user X among many from a 150M+ pool".
Once you solve the bigger problem, the problem you now have is just its step #1.
So I'd check out a scalable database solution (possibly a NoSQL one), and implement the "availability check" using that.
You might end with a
retrieveUserData(user, password = None)
which returns the user info if user and password are valid and correct. For the availability check, you would send no password, and expect an UserNotFound exception if the username is available.

Client-server synchronization pattern / algorithm?

I have a feeling that there must be client-server synchronization patterns out there. But i totally failed to google up one.
Situation is quite simple - server is the central node, that multiple clients connect to and manipulate same data. Data can be split in atoms, in case of conflict, whatever is on server, has priority (to avoid getting user into conflict solving). Partial synchronization is preferred due to potentially large amounts of data.
Are there any patterns / good practices for such situation, or if you don't know of any - what would be your approach?
Below is how i now think to solve it:
Parallel to data, a modification journal will be held, having all transactions timestamped.
When client connects, it receives all changes since last check, in consolidated form (server goes through lists and removes additions that are followed by deletions, merges updates for each atom, etc.).
Et voila, we are up to date.
Alternative would be keeping modification date for each record, and instead of performing data deletes, just mark them as deleted.
Any thoughts?
You should look at how distributed change management works. Look at SVN, CVS and other repositories that manage deltas work.
You have several use cases.
Synchronize changes. Your change-log (or delta history) approach looks good for this. Clients send their deltas to the server; server consolidates and distributes the deltas to the clients. This is the typical case. Databases call this "transaction replication".
Client has lost synchronization. Either through a backup/restore or because of a bug. In this case, the client needs to get the current state from the server without going through the deltas. This is a copy from master to detail, deltas and performance be damned. It's a one-time thing; the client is broken; don't try to optimize this, just implement a reliable copy.
Client is suspicious. In this case, you need to compare client against server to determine if the client is up-to-date and needs any deltas.
You should follow the database (and SVN) design pattern of sequentially numbering every change. That way a client can make a trivial request ("What revision should I have?") before attempting to synchronize. And even then, the query ("All deltas since 2149") is delightfully simple for the client and server to process.
As part of the team, I did quite a lot of projects which involved data syncing, so I should be competent to answer this question.
Data syncing is quite a broad concept and there are way too much to discuss. It covers a range of different approaches with their upsides and downsides. Here is one of the possible classifications based on two perspectives: Synchronous / Asynchronous, Client/Server / Peer-to-Peer. Syncing implementation is severely dependent on these factors, data model complexity, amount of data transferred and stored, and other requirements. So in each particular case the choice should be in favor of the simplest implementation meeting the app requirements.
Based on a review of existing off-the-shelf solutions, we can delineate several major classes of syncing, different in granularity of objects subject to synchronization:
Syncing of a whole document or database is used in cloud-based applications, such as Dropbox, Google Drive or Yandex.Disk. When the user edits and saves a file, the new file version is uploaded to the cloud completely, overwriting the earlier copy. In case of a conflict, both file versions are saved so that the user can choose which version is more relevant.
Syncing of key-value pairs can be used in apps with a simple data structure, where the variables are considered to be atomic, i.e. not divided into logical components. This option is similar to syncing of whole documents, as both the value and the document can be overwritten completely. However, from a user perspective a document is a complex object composed of many parts, but a key-value pair is but a short string or a number. Therefore, in this case we can use a more simple strategy of conflict resolution, considering the value more relevant, if it has been the last to change.
Syncing of data structured as a tree or a graph is used in more sophisticated applications where the amount of data is large enough to send the database in its entirety at every update. In this case, conflicts have to be resolved at the level of individual objects, fields or relationships. We are primarily focused on this option.
So, we grabbed our knowledge into this article which I think might be very useful to everyone interested in the topic => Data Syncing in Core Data Based iOS apps (http://blog.denivip.ru/index.php/2014/04/data-syncing-in-core-data-based-ios-apps/?lang=en)
What you really need is Operational Transform (OT). This can even cater for the conflicts in many cases.
This is still an active area of research, but there are implementations of various OT algorithms around. I've been involved in such research for a number of years now, so let me know if this route interests you and I'll be happy to put you on to relevant resources.
The question is not crystal clear, but I'd look into optimistic locking if I were you.
It can be implemented with a sequence number that the server returns for each record. When a client tries to save the record back, it will include the sequence number it received from the server. If the sequence number matches what's in the database at the time when the update is received, the update is allowed and the sequence number is incremented. If the sequence numbers don't match, the update is disallowed.
I built a system like this for an app about 8 years ago, and I can share a couple ways it has evolved as the app usage has grown.
I started by logging every change (insert, update or delete) from any device into a "history" table. So if, for example, someone changes their phone number in the "contact" table, the system will edit the contact.phone field, and also add a history record with action=update, table=contact, field=phone, record=[contact ID], value=[new phone number]. Then whenever a device syncs, it downloads the history items since the last sync and applies them to its local database. This sounds like the "transaction replication" pattern described above.
One issue is keeping IDs unique when items could be created on different devices. I didn't know about UUIDs when I started this, so I used auto-incrementing IDs and wrote some convoluted code that runs on the central server to check new IDs uploaded from devices, change them to a unique ID if there's a conflict, and tell the source device to change the ID in its local database. Just changing the IDs of new records wasn't that bad, but if I create, for example, a new item in the contact table, then create a new related item in the event table, now I have foreign keys that I also need to check and update.
Eventually I learned that UUIDs could avoid this, but by then my database was getting pretty large and I was afraid a full UUID implementation would create a performance issue. So instead of using full UUIDs, I started using randomly generated, 8 character alphanumeric keys as IDs, and I left my existing code in place to handle conflicts. Somewhere between my current 8-character keys and the 36 characters of a UUID there must be a sweet spot that would eliminate conflicts without unnecessary bloat, but since I already have the conflict resolution code, it hasn't been a priority to experiment with that.
The next problem was that the history table was about 10 times larger than the entire rest of the database. This makes storage expensive, and any maintenance on the history table can be painful. Keeping that entire table allows users to roll back any previous change, but that started to feel like overkill. So I added a routine to the sync process where if the history item that a device last downloaded no longer exists in the history table, the server doesn't give it the recent history items, but instead gives it a file containing all the data for that account. Then I added a cronjob to delete history items older than 90 days. This means users can still roll back changes less than 90 days old, and if they sync at least once every 90 days, the updates will be incremental as before. But if they wait longer than 90 days, the app will replace the entire database.
That change reduced the size of the history table by almost 90%, so now maintaining the history table only makes the database twice as large instead of ten times as large. Another benefit of this system is that syncing could still work without the history table if needed -- like if I needed to do some maintenance that took it offline temporarily. Or I could offer different rollback time periods for accounts at different price points. And if there are more than 90 days of changes to download, the complete file is usually more efficient than the incremental format.
If I were starting over today, I'd skip the ID conflict checking and just aim for a key length that's sufficient to eliminate conflicts, with some kind of error checking just in case. (It looks like YouTube uses 11-character random IDs.) The history table and the combination of incremental downloads for recent updates or a full download when needed has been working well.
For delta (change) sync, you can use pubsub pattern to publish changes back to all subscribed clients, services like pusher can do this.
For database mirror, some web frameworks use a local mini database to sync server side database to local in browser database, partial synchronization is supported. Check meteror.
This page clearly describes mosts scenarios of data synchronization with patterns and example code: Data Synchronization: Patterns, Tools, & Techniques
It is the most comprehensive source I found, considering whole of delta syncs, strategies on how to handle deletions and server-to-client and client-to-server sync. It is a very good starting point, worth a look.

Resources