Database read locking - spring

I have a use case where I need to do the following things in one transaction:
start the transaction
INSERT an item into a table
SELECT all the items in the table
dump the selected items into a file (this file is versioned and another program always uses the latest version)
If all the above things succeed, commit the transaction, if not, rollback.
If two transactions begin almost simultaneously, it is possible that before the first transaction A commits what it has inserted into the table (step 4), the second transaction B has already performed the SELECT operation(step 2) whose result doesn't contain yet the inserted item by the first transaction(as it is not yet committed by A, so not visible to B). In this case, when A finishes, it will have correctly dumped a file File1 containing its inserted item. Later, B finishes, it will have dumped another file File2 containing only its inserted item but not the one inserted by A. Since File2 is more recent, we will use File2. The problem is that File2 doesn't contain the item inserted by A even though this item is well in the DB.
I would like to know if it is feasible to solve this problem by locking the read(SELECT) of the table when a transaction inserts something into the table until its commit or rollback and if yes, how this locking can be implemented in Spring with Oracle as DB.

You need some sort of synchronization between the transactions:
start the transaction
Obtain a lock to prevent the transaction in another session to proceed or wait until the transaction in the other session finishes
INSERT an item into a table
SELECT ......
......
Commit and release the lock
The easiest way is to use LOCK TABLE command, at least in SHARE mode (SHARE ROW EXCLUSIVE or EXCLUSIVE modes can also be used, but they are too restrictve for this case).
The advantage of this approach is that the lock is automatically released at commit or rollback.
The disadvantage is a fact, that this lock can interfere with other transactions in the system that update this table at the same time, and could reduce an overall performance.
Another approach is to use DBMS_LOCK package. This lock doesn't affect other transactions that don't explicitely use that lock. The drawaback is that this package is difficult to use, the lock is not released on commit nor rollback, you must explicitelly release the lock at the end of the transaction, and thus all exceptions must be carefully handled, othervise a deadlock easily could occur.
One more solution is to create a "dummy" table with a single row in it, for example:
CREATE TABLE my_special_lock_table(
int x
);
INSERT INTO my_special_lock_table VALUES(1);
COMMIT:
and then use SELECT x FROM my_special_lock_table FOR UPDATE
or - even easier - simple UPDATE my_special_lock_table SET x=x in your transaction.
This will place an exclusive lock on a row in this table and synchronize only this one transaction.
A drawback is that another "dummy" table must be created.
But this solution doesn't affect the other transactions in the system, the lock is automatically released upon commit or rollback, and it is portable - it should work in all other databases, not only in Oracle.

Use spring's REPEATABLE_READ or SERIALIZABLE isolation levels:
REPEATABLE_READ A constant indicating that dirty reads and
non-repeatable reads are prevented; phantom reads can occur. This
level prohibits a transaction from reading a row with uncommitted
changes in it, and it also prohibits the situation where one
transaction reads a row, a second transaction alters the row, and the
first transaction rereads the row, getting different values the second
time (a "non-repeatable read").
SERIALIZABLE A constant indicating that dirty reads, non-repeatable
reads and phantom reads are prevented. This level includes the
prohibitions in ISOLATION_REPEATABLE_READ and further prohibits the
situation where one transaction reads all rows that satisfy a WHERE
condition, a second transaction inserts a row that satisfies that
WHERE condition, and the first transaction rereads for the same
condition, retrieving the additional "phantom" row in the second read.
with serializable or repeatable read, the group will be protected from non-repeatable reads:
connection 1: connection 2:
set transaction isolation level
repeatable read
begin transaction
select name from users where id = 1
update user set name = 'Bill' where id = 1
select name from users where id = 1 |
commit transaction |
|--> executed here
In this scenario, the update will block until the first transaction is complete.
Higher isolation levels are rarely used because they lower the number of people that can work in the database at the same time. At the highest level, serializable, a reporting query halts any update activity.

I think you need to serialize the whole transaction. While a SELECT ... FOR UPDATE could work, it does not really buy you anything, since you would be selecting all rows. You may as well just take and release a lock, using DBMS_LOCK()

Related

Is Oracle DB truly isolated during execution of COMMIT?

Consider these two transactions:
INSERT INTO foo VALUES (1, 2, 'bar');
INSERT INTO foo VALUES (1, 4, 'xyz');
COMMIT;
and
SELECT * FROM foo;
Is there any point in time when the SELECT would see only one row inserted from the first transaction?
So far I couldn't find any evidence that the data are visible only after the COMMIT is successfully finished. As Oracle writes the Redo log during commit, it writes it in a serial fashion, am I right? So there is a point where first row is written, but not the second one. And since writers do not block readers in Oracle, if the select hits exactly this window, then it sees only one row. Or is there some other locking mechanism?
No.
The data will not exist until the commit has been successful.
see ATOMICITY
Of course in the same session you can see the uncommited data
e.g:
INSERT INTO foo VALUES (1, 2, 'bar');
SELECT * FROM foo;
INSERT INTO foo VALUES (1, 4, 'xyz');
COMMIT;
The select will show the inserted data even though the commit has not yet executed.
Nope. It's impossible to see just one row.
I don't have exact implemenation details but the main idea is every record has associated last modified transaction number. When other transaction reads data it checks the status of the last modified record transaction (and their own isolation level) and fetches only allowed records. (This is a pretty common for any MVCC databases)
Moreover even when fetching transaction has RC isolation level each query before execution makes a snapshot of currently active transaction statuses and uses it to perform check above. It actually means that every query runs in SNAPSHOT isolation level. (This is oracle specific feature)
More details here: https://docs.oracle.com/cd/E25054_01/server.1111/e25789/consist.htm
Check the multiversion read and the statement level read consistency parts.

Are inserts with sequence numbers nextval atomic for this number?

If I am inserting rows into a table in auto-commit mode where one column is defined by a sequence's nextval value, are these values guaranteed to become visible in the order they are inserted? I am wondering if a scenario where from three concurrent connections:
Inserts foo
Inserts bar
Select all, observes bar with sequence number 2 but not foo with sequence number 1.
is possible.
The Oralce sequences are thread safe and are always created in order. It is guaranteed that the numbers produced are unique.
But you might not see an insert of an other session immediatley, if this other session has still an open transaction. This might create a temporary gap in the sequence you are seeing from SELECTs.
Further more, if a transaction which has called NEXTVAL is rolled back, then this will cause permanent gaps in the sequence. Sequences are not affected by roll backs or commits. An increment is always immediate and definitive.
See: CREATE SEQUENCE (Oracle Help Center)
"Auto-commit" is not a concept of the Oracle database. That is, there is no "auto-commit" mode or feature in the database -- it is only implemented in tools (like SQL*Plus).
A tool can implement "auto-commit" in different ways, but in most cases, it's probably along the lines of this:
(user's command, e.g., INSERT INTO ...)
<success response from Oracle server>
COMMIT;
In this case, the COMMIT does not get issued by the tool until there is a positive response from the server that the user's command has been executed. In a networked environment with >10ms latency, plus the vagaries of multithreading on the Oracle server itself, I would say there could be situations where session #2's automatic COMMIT gets processed on the server before session #1's and that, therefore, it is possible for session #3 to observe "bar" but not "foo".
The COMMIT timing of each session relative to the time at which session #3 starts its query is the only thing that matters. Session #3 will see whatever work session #1 and/or session #2 have committed as of the time session #3's query starts.

Consecutive application threads and uncommitted data in Oracle

Our application reads a record from an Oracle 'Event' table. When the event record exists we update the 'count' field of that record. If the record doesn't exist we insert it. So we want only 1 record for a particular event in the table.
The problem with this is probably quite predictable: one application thread will read the table, see the event is not there, insert the new event and commit. But before it commits a second thread will also read the table and see the event is not there. And then both threads will insert the event and we end up with 2 records for the same event.
I guess synchronizing access to this particular method in our application will prevent this problem, but what is the best option in Oracle to prevent this? Will MERGE for example always prevent this problem?
Serialising access to the procedure that implements this functionality would be trivial to implement, using DBMS_LOCK to define and take an exclusive lock.
Serialising through SQL based methods is practically impossible, due to the read consistency model.
CREATE TABLE EVENTS (ID NUMBER PRIMARY KEY, COUNTER NUMBER NOT NULL);
MERGE INTO EVENTS
USING (SELECT ID, COUNTER FROM DUAL LEFT JOIN EVENTS ON EVENTS.ID = :EVENT_ID) SRC
ON (EVENTS.ID = SRC.ID)
WHEN MATCHED THEN UPDATE SET COUNTER = SRC.COUNTER + 1
WHEN NOT MATCHED THEN INSERT (ID, COUNTER) VALUES (:EVENT_ID, 1);
Simple SQL securing single record for each ID and consistently increasing the counter no matter what application fires it or number of concurrent thread. You don't need to code anything at all and it's very lightweight as well.
It also doesn't produce any exception related to data consistency so you don't need any special handling.
UPDATE: It actually produces unique violation exception if both threads are inserting. I thought the second merge would switch to update, but it doesn't.
UPDATE: Just tested the same case on SQL Server and when executing in parallel and the record doesn't exist one MERGE inserts and the second updates.

Finding all statements involved in a deadlock from an Oracle trace file?

As I understand it, the typical case of a deadlock involving row-locking requires four SQL statements. Two in one transaction to update row A and row B, and then a further two in a separate transaction to update the same rows, and require the same locks, but in the reverse order.
Transaction 1 gets the lock on row A before transaction 2 can request it, transaction 2 gets the lock on row B before transaction 1 can get it, and neither can get the remaining required locks. One or either transaction has to be rolled back, so the other can complete.
When I review an Oracle trace file after a deadlock, it only seems to highlight two queries. These seem to be the last one out of each transaction.
How can I identify the other statements involved in each transaction, or is this missing in an Oracle trace file?
I can include relevant bits of the specific trace file if required.
You're correct, in a typical row-level deadlock, you'll have session 1 execute sql_a that will lock row 1. Then session 2 will execute sql_b that will lock row 2. Then session 1 will execute sql_c to attempt to lock row 2, but session 2 has not committed, and so session 1 starts waiting. Finally, session 2 comes along, and it issues sql_d, attempting to lock row 1, but, since session 1 holds that lock, it starts waiting. Three seconds later, the deadlock is detected, and one of the sessions will catch ORA-00060 and the trace file is written.
In this scenario, the trace file will contain sql_c and sql_d, but not sql_a or sql_b.
The problem is that information just really isn't available anywhere. Consider that you execute a DML, it starts a transaction if one doesn't exist, generates a bunch of undo and redo, and the change is made. But, once that happens, the session is no longer associated with that SQL statement. There's really no clean way to go back and find that information.
sql_c and sql_d, on the other hand, are the statements that were associated with those sessions when the deadlock occurred, so, clearly, Oracle can identify them, and include that in the trace file.
So, you're correct, the information about sql_a and sql_b is not in the trace, and it's really not readily available.
Hope that helps.

Do the time of the COMMIT and ROLLBACK affect performance?

Suppose I have a set of ID . For each ID , I will insert many records to many different tables based on the ID .Between inserting difference tables, different business checks will be called . If any checking fail , all the records that are inserted based on this ID will be ROLLBACK .This bulk insert action is done through using PL/SQL . Do the time of the COMMIT and ROLLBACK affect the performance and how does it affect ? For example , should I COMMIT after finish the process for one ID or COMMIT after finish all ID?
This is not so much of a performance decision but a process design decision. Do you want the other IDs to stay in the database when you have to roll back a faulty ID?
For obvious reasons, rollback takes longer when more rows must be rolled back. Rollback usually takes longer (sometimes much longer!) than the operations that have to be rolled back. Commit is always fast in Oracle, so it probably doesn't matter how often you commit in that regard.
Your problem description indicates you have a large set of smaller logical transactions (each new ID is a transaction). You should commit each logical transaction. The two reasons to wait to commit the entire set of transactions are:
If the entire set of transactions is in fact a transaction itself - all inserts must succeed for any rows to be committed. In that context, your smaller "transactions" aren't truly transactions.
You don't have a restart capability in your bulk load process, which in effect makes this a special case of item 1. If your bulk load process aborts, you need a way to skip successfully applied ID's.
Tom Kyte's advice is to commit each logical unit of work - the transaction.
Don't take the transaction time longer. make it short as possible as you can. Because according to your query some locks have been created. This locks may cause perfomance issues... so do it ID by ID...
There are two "forces" at work....
locking
during your open transaction, oracle puts locks on the changed rows.
whenever another transaction needs to update any of the locked rows,
it has to wait.
in the worst case, you can even build a deadlock.
synchronous write
every commit performs a synchronous write.
(there are ways to disable that, but it is usually the thing everybody wants: integrity).
that synchronous write can take (much) longer then the a regular write (that can be buffered).
Not to forget that there is usually an additional network round trip involved with an commit.
so, the one force says "commit as soon as possible (considering your integrity requirements)" the other says "commit as as less often as possible".
There are some other issues to consider as well, e.g. the maximum transaction size. every uncommited transaction needs some temporary space. the bigger the transaction gets, the more you need. You can also run into ORA-01555 "snapshot too old".
If there is any advice to give, then it is to implement a configurable "commit frequency" so that you can easily change it as needed.
One option if you need to control the individual sets but retain the ability to commit or rollback the entire transaction is to use savepoints. You can set a savepoint at the beginning of the outermost loop, then rollback to it if an error occurs. You might end up with something like this:
begin
--Initial batch logging
for r_record in cur_cursor loop
savepoint s_cursor loop;
begin
--Process rows
exception
when others then
rollback to s_cursor;
end;
end loop;
--Final batch logging
exception
when others then
rollback;
raise;
end;

Resources