Is relying on the oracle ROLLBACK command good practice for importing data, validating the data and THEN performing a ROLLBACK?
I've had a data import program built for our ERP, and looking at the code, they insert the data into the real tables, validate, and if it fails validation, they perform a ROLLBACK. I've always validated data before inserting, but just curious if this is an accepted method to rely on?
There are a few things to remember here-
Constraints enable us preserve data integrity. This means that constraints allow us to enforce business rules (or at least the most basic of those) at the database level itself.
A commit or a rollback is a method of preserving or undoing the changes made in a transaction. If you issue a commit after a series of successfully run DML statements, the changes are preserved. The rollback statement would undo the changes.
If, in a series of DML statements, if one of those fails, the effects of that particular statement are rolled back. E.g., if an UPDATE statement updates 10 rows and one of those violates a vital constraint, any of the 10 rows are not updated. Yet, the effects of its preceding statements are not implicitly rolled back.
In order to preserve data integrity and keep the data as per the business requirements, you must issue a manual ROLLBACK statement if any of the DMLs fail.
What you are seeing in your program is the same practice. It doesn't issue a ROLLBACK after a successful transaction, but only after a failed DML, if you look at the code closely. This is indeed a good practice to roll back on failure and commit only if everything goes right.
Front end checks on data are indeed an essential part of any application. This ensures that the data being entered conforms to the business roles. Even in this case, constraints must be applied to perform checks at the database level. This is particularly helpful when some rookie makes changes to the front end and tries to enter invalid data. This is also helpful when someone is bypassing the application and entering data manually. Hence putting constraints on the database level is always necessary.
Related
I have a 3th party Java library that in a moment, gets a JDBC connection, starts a transaction, does several batch updates with PreparedStatement.addBatch(), executes the batch, commits the transaction and closes the connection. Almost immediately after (in the span of <10 milliseconds), the library gets another connection and queries one of the records affected by the update.
For the proper functioning of the library, that query should return the updated record. However, in some rare cases, I'm getting (using P6Spy) that the query is returning the record with its values before the update (and the library fails in some point forwards due to unexpected data).
I'm trying to understand why this would happen, and then I found that in my database (Oracle 19c) there is a parameter COMMIT_WAIT that basically gives the possibility that a call to a commit doesn't block until the commit is finished, obtaining an asynchronous commit. So I used the SHOW PARAMETERS to see the value of that parameter and I found out that COMMIT_WAIT is set up to NOWAIT (also, COMMIT_LOGGING was set up to BATCH).
I began to speculate if what was happening was that the call to commit() just started the operation (without waiting for it to finish), and perhaps the next query occurred while the operation was still in progress, returning the value of the record before the transaction. (The isolation level for all connections is Connection.TRANSACTION_READ_COMMITTED)
Can COMMIT_WAIT set up to NOWAIT cause that kind of scenario? I read that the use of NOWAIT has a lot of risks associated with it, but mostly they refers to things like loss of durability if the database crashes.
Changing the commit behavior should not affect database consistency and should not cause wrong results to be returned.
A little background - Oracle uses REDO for durability (recovering data after an error) and uses UNDO for consistency (making sure the correct results are always returned for any point-in-time). To improve performance, there are many tricks to reduce REDO and UNDO. But changing the commit behavior doesn't reduce the amount of logical REDO and UNDO, it only delays and optimizes the REDO physical writes.
Before a commit happens, and even before your statements return, the UNDO data used for consistency has been written to memory. Changing the commit behavior won't stop the changes from making their way to the UNDO tablespace.
Per the Database Reference for COMMIT_WAIT, "Also, [the parameter] can violate the durability of ACID (Atomicity, Consistency, Isolation, Durability) transactions if the database shuts down unexpectedly." Since the manual is already talking about the "D" in ACID, I assume it would also explicitly mention if the parameter affects the "C".
On the other hand, the above statements are all just theory. It's possible that there's some UNDO optimization bug that's causing the parameter to break something. But I think that would be extremely unlikely. Oracle goes out of its way to make sure that data is never lost or incorrect. (I know because even when I don't want REDO or UNDO it's hard to turn them off.)
What is the difference between truncation, transaction and deletion database strategies when using Rspec? I can't find any resources explaining this. I read the Database Cleaner readme but it doesn't explain what each of these do.
Why do we have to use truncation strategy for Capybara? Do I have to clean up my database when testing or can I disable it. I dont understand why I should clean up my database after each test case, wouldn't it just slow down testing?
The database cleaning strategies refer to database terminology. I.e. those terms come from the (SQL) database world, so people generally familiar with database terminology will know what they mean.
The examples below refer to SQL definitions. DatabaseCleaner however supports other non-SQL types of databases too, but generally the definitions will be the same or similar.
Deletion
This means the database tables are cleaned using the SQL DELETE FROM statement. This is usually slower than truncation, but may have other advantages instead.
Truncation
This means the database tables are cleaned using the TRUNCATE TABLE statement. This will simply empty the table immediately, without deleting the table structure itself or deleting records individually.
Transaction
This means using BEGIN TRANSACTION statements coupled with ROLLBACK to roll back a sequence of previous database operations. Think of it as an "undo button" for databases. I would think this is the most frequently used cleaning method, and probably the fastest since changes need not be directly committed to the DB.
Example discussion: Rspec, Cucumber: best speed database clean strategy
Reason for truncation strategy with Capybara
The best explanation was found in the Capybara docs themselves:
# Transactional fixtures do not work with Selenium tests, because Capybara
# uses a separate server thread, which the transactions would be hidden
# from. We hence use DatabaseCleaner to truncate our test database.
Cleaning requirements
You do not necessarily have to clean your database after each test case. However you need to be aware of side effects this could have. I.e. if you create, modify, or delete some records in one step, will the other steps be affected by this?
Normally RSpec runs with transactional fixtures turned on, so you will never notice this when running RSpec - it will simply keep the database automatically clean for you:
https://www.relishapp.com/rspec/rspec-rails/v/2-10/docs/transactions
An application I am trying to support is currently running into unique constraint violations. I haven't been able to reproduce this problem in non-production environments. Is it reasonable, for debugging purposes, to create a rule (trigger?) that will in effect just copy every insert to a different table? So in effect the new table will be the same as the old table without a constraint, hopefully.
The application is using Spring to manage transactionality, and I haven't been able to find any documentation relating rules to transactions. After the violation, whatever is written so far in the transaction is rolled back - will this affect the rule in any way?
This is Postgres 8.3.
After the violation, whatever is written so far in the transaction is
rolled back - will this affect the rule in any way?
That will rollback everything the rule did, as well. You could create a trigger that uses dblink, to get some work done outside your current transaction. Another option could be a savepoint, but then you have to change all your current code and transaction.
Unique violations are logged in the logfiles as well, get this information to see what is going wrong. Version 9.0 has a change that will tell you also what the values are:
Improve uniqueness-constraint violation error messages to report the
values causing the failure (Itagaki Takahiro) For example, a
uniqueness constraint violation might now report Key (x)=(2) already
exists.
You can do almost anything you can imagine with rules and triggers. And then some more. Your exact intent remains somewhat unclear, though.
If the transaction is rolled back anyway, as you hint at the end, then everything will be undone, including all side-effects of any rules or triggers involved. Your plan would be futile.
There is a workaround for that in case that is, in fact, what you want to achieve: use dblink to link and INSERT to a table in the same database. That's not rolled back.
However, if it's just for debugging purposes, the database log is a much simpler way to see which duplicates have not been entered. Errors are logged by default. If not, you can set it up as you need it. See about your options in the manual.
As has been said, rules cannot be used for this purpose, as they only serve to rewrite the query. But rewritten query is just like the original one still part of the transaction.
Rules can be used to enforce constraints that are impossible to implement using regular constraints, such as a key being unique among several tables, or other multi-table stuff. (these do have the advantage of the "canary" tablename showing up in the logs and error messages) But the OP already had too many constraints, it appears...
Tweaking the serialisation level also seems indicated (are there multiple sessions involved? does the framework use a connection pool?)
Can be autonomous transactions dangerous? If yes, in which situations? When autonomous transactions are necessary?
Yes, autonomous transactions can be dangerous.
Consider the situation where you have your main transaction. It has inserted/updated/deleted rows. If you then, within that, set up an autonomous transaction then either
(1) It will not query any data at all. This is the 'safe' situation. It can be useful to log information independently of the primary transaction so that it can be committed without impacting the primary transaction (which can be useful for logging error information when you expect the primary transaction to be rolled back).
(2) It will only query data that has not been updated by the primary transaction. This is safe, but superfluous. There is no point to the autonomous transaction.
(3). It will query data that has been updated by the primary transaction. This smacks of a poorly thought through design, since you've overwritten something and then need to go back to see what it was before you overwrote it. Sometimes people think that an autonomous transaction will still see the uncommitted changes of the primary transaction, and it won't. It reads the currently committed state of the database, plus any changes made within the autonomous transaction. Some people (often trying autonomous transactions in response to mutating trigger errors) don't care what state the data is in when they try to read it and these people simply shouldn't be allowed access to a database.
(4). It will try to update/delete data that hasn't been updated by the primary transaction. Again, this smacks of poor design. These changes are going to get committed (or rolled back) whether or not the primary transaction succeeds or fails. Worse you risk issue (5) since it is hard to determine, within an autonomous transaction, whether the data has been updated by the primary transaction.
(5). You try to update/delete data that has already been updated by the primary transaction, in which case it will deadlock and end up in an ugly mess.
Can be autonomous transactions dangerous?
Yes.
If yes, in which situations?
When they're misused. For example, when used to make changes to data which should have been rolled back if the rest of the parent transaction is rolled back. Misusing them can cause data corruption because some portions of a change are committed, while others are not.
When are autonomous transactions necessary?
They are necessary when the effects of one transaction must survive, regardless of whether the parent transaction is committed or rolled back. A good example is a procedure which logs the progress and activity of a process to a database table.
When are autonomous transactions necessary?
Check my question: How can LOCK survive COMMIT or how can changes to LOCKed table be propagated to another session without COMMIT and losing LOCK
We ingest business configurations sequentially and should forbid parallel processing.
I use lock to table with configurations and update other tables accordingly. I commit each batched updates to other tables as we can't afford to keep transaction on all records - probability of collision would be near 0.99.
Each failure because of concurrent access is persisted to log for later update attempt.
I have a problem understanding read consistency in database (Oracle).
Suppose I am manager of a bank . A customer has got a lock (which I don't know) and is doing some updating. Now after he has got a lock I am viewing their account information and trying to do some thing on it. But because of read consistency I will see the data as it existed before the customer got the lock. So will not that affect inputs I am getting and the decisions that I am going to make during that period?
The point about read consistency is this: suppose the customer rolls back their changes? Or suppose those changes fail because of a constraint violation or some system failure?
Until the customer has successfully committed their changes those changes do not exist. Any decision you might make on the basis of a phantom read or a dirty read would have no more validity than the scenario you describe. Indeed they have less validity, because the changes are incomplete and hence inconsistent. Concrete example: if the customer's changes include making a deposit and making a withdrawal, how valid would your decision be if you had looked at the account when they had made the deposit but not yet made the withdrawal?
Another example: a long running batch process updates the salary of every employee in the organisation. If you run a query against employees' salaries do you really want a report which shows you half the employees with updated salaries and half with their old salaries?
edit
Read consistency is achieved by using the information in the UNDO tablespace (rollback segments in the older implementation). When a session reads data from a table which is being changed by another session, Oracle retrieves the UNDO information which has been generated by that second session and substitutes it for the changed data in the result set presented to the first session.
If the reading session is a long running query it might fail because due to the notorious ORA-1555: snapshot too old. This means the UNDO extent which contained the information necessary to assemble a read consistent view has been overwritten.
Locks have nothing to do with read consistency. In Oracle writes don't block reads. The purpose of locks is to prevent other processes from attempting to change rows we are interested in.
For systems that have large number of users, where users may "hold" the lock for a long time the Optimistic Offline Lock pattern is usually used, i.e. use the version in the UPDATE ... WHERE statement.
You can use a date, version id or something else as the row version. Also the virtual columm ORA_ROWSCN may be used but you need to read up on it first.
When a record is locked due to changes or an explicit lock statement, an entry is made into the header of that block. This is called an ITL (interested transaction list). When you come along to read that block, your session sees this and knows where to go to get the read consistent copy from the rollback segment.