How to make MERGE serializable - oracle

When doing concurrent MERGEs while every session uses a different value (shown as *** in the snippet below) for the primary key column id, everything is fine if I do it manually in 2 terminal sessions.
MERGE
INTO x
USING (SELECT *** as id FROM DUAL) MERGE_SRC
ON (x.id = MERGE_SRC.id)
WHEN MATCHED THEN UPDATE SET val = val + 1 WHERE id = ***
WHEN NOT MATCHED THEN INSERT VALUES (***, 99);
COMMIT;
However, running a multi-threaded load test with 3 or more threads, I will relatively quickly run into ORA-08177 with locked table. Why is that? (And why is it non-deterministic in that it does not always happen when transactions overlap?)
The table was created using
create table x (id int primary key, val int);
SQL Server btw never throws exceptions with an equivalent MERGE statement, running the same experiment. That is even true when working on the same row simultaneously.
Is it because perhaps MERGE is not atomic, and the serializable mode runs optimistically, so that the race might only show with sufficient contention? Still, why does it happen even when not working on the same row concurrently?
Btw, my attempts to fix this using the strictest lock available were unsuccessful. So any ideas on how to make this atomic are very much appreciated. It looks like relaxing the isolation level would rid me of the exception, but risk inconsistencies in case there turn out to be 2 updates on the same row (otherwise why would it balk in serializable mode in the first place).

The exception you're seeing is a direct consequence of using strict serialization. If you have more than one transaction active simultaneously, each started with SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, when any one of them commits the others will get an ORA-08177. That's how strict serialization is enforced - the database throws an ORA-08177 in any session started with ISOLATION LEVEL SERIALIZABLE if another transaction commits into a table which the serializable session needs. So, basically, if you really need strict serialization you have to handle the ORA-08177's intelligently, as in the following:
DECLARE
bSerializable_trans_complete BOOLEAN := FALSE;
excpSerializable EXCEPTION;
PRAGMA EXCEPTION_INIT(excpSerializable, -08177);
BEGIN
<<SERIALIZABLE_LOOP>>
WHILE NOT bSerializable_trans_complete
LOOP
BEGIN
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
MERGE ...; -- or whatever
COMMIT;
bSerializable_trans_complete := TRUE; -- allow SERIALIZABLE_LOOP to exit
EXCEPTION
WHEN excpSerializable THEN
ROLLBACK;
CONTINUE SERIALIZABLE_LOOP;
END;
END LOOP; -- SERIALIZABLE_LOOP
END;
Serialization is not magic, and it's not "free" (where "free" means "I as the developer don't have to do anything to make it work properly"). It requires more planning and work on the part of the developer to have it function properly, not less. Share and enjoy.

In Oracle, the SERIALIZABLE mode works optimistically, in contrast to e.g. SQL Server, which does pessimistic locking in that mode. Which means that in the latter case you can even concurrently change the same row without running into exceptions.
Despite the docs:
Oracle Database permits a serializable transaction to modify a row only if changes to the row made by other transactions were already committed when the serializable transaction began. The database generates an error when a serializable transaction tries to update or delete data changed by a different transaction that committed after the serializable transaction began:
load testing showed that exceptions may also get thrown when not working on the same row simultaneously, although that is not guaranteed, in contrast to when working on the same row, which will always result in an ORA-08177.

Related

"Freeze" data state in an Oracle PL/SQL procedure

I have a quite complicated PL/SQL procedure selecting data dynamically and multiple times. Eventually, the data is aggregated and returned.
However, due to multiple selects, the returned data might be inconsistant, because data changes might happen between the different selects. A process might look like this:
Select data 1
Change from a different session
Select data 2
--> Data 2 might now be inconsistant with data 1 due to the changes.
Unfortunately, I cannot select all data at once with a single select. This would be too complicated.
Is there a way in Oracle to somehow "freeze" the data state (not respecting any incoming changes) until the procedure is finished?
You are looking for Oracle's Serializable Isolation Level:
In the serializable isolation level, a transaction sees only changes committed at the time the transaction—not the query—began and changes made by the transaction itself.
It is started like this:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Database read locking

I have a use case where I need to do the following things in one transaction:
start the transaction
INSERT an item into a table
SELECT all the items in the table
dump the selected items into a file (this file is versioned and another program always uses the latest version)
If all the above things succeed, commit the transaction, if not, rollback.
If two transactions begin almost simultaneously, it is possible that before the first transaction A commits what it has inserted into the table (step 4), the second transaction B has already performed the SELECT operation(step 2) whose result doesn't contain yet the inserted item by the first transaction(as it is not yet committed by A, so not visible to B). In this case, when A finishes, it will have correctly dumped a file File1 containing its inserted item. Later, B finishes, it will have dumped another file File2 containing only its inserted item but not the one inserted by A. Since File2 is more recent, we will use File2. The problem is that File2 doesn't contain the item inserted by A even though this item is well in the DB.
I would like to know if it is feasible to solve this problem by locking the read(SELECT) of the table when a transaction inserts something into the table until its commit or rollback and if yes, how this locking can be implemented in Spring with Oracle as DB.
You need some sort of synchronization between the transactions:
start the transaction
Obtain a lock to prevent the transaction in another session to proceed or wait until the transaction in the other session finishes
INSERT an item into a table
SELECT ......
......
Commit and release the lock
The easiest way is to use LOCK TABLE command, at least in SHARE mode (SHARE ROW EXCLUSIVE or EXCLUSIVE modes can also be used, but they are too restrictve for this case).
The advantage of this approach is that the lock is automatically released at commit or rollback.
The disadvantage is a fact, that this lock can interfere with other transactions in the system that update this table at the same time, and could reduce an overall performance.
Another approach is to use DBMS_LOCK package. This lock doesn't affect other transactions that don't explicitely use that lock. The drawaback is that this package is difficult to use, the lock is not released on commit nor rollback, you must explicitelly release the lock at the end of the transaction, and thus all exceptions must be carefully handled, othervise a deadlock easily could occur.
One more solution is to create a "dummy" table with a single row in it, for example:
CREATE TABLE my_special_lock_table(
int x
);
INSERT INTO my_special_lock_table VALUES(1);
COMMIT:
and then use SELECT x FROM my_special_lock_table FOR UPDATE
or - even easier - simple UPDATE my_special_lock_table SET x=x in your transaction.
This will place an exclusive lock on a row in this table and synchronize only this one transaction.
A drawback is that another "dummy" table must be created.
But this solution doesn't affect the other transactions in the system, the lock is automatically released upon commit or rollback, and it is portable - it should work in all other databases, not only in Oracle.
Use spring's REPEATABLE_READ or SERIALIZABLE isolation levels:
REPEATABLE_READ A constant indicating that dirty reads and
non-repeatable reads are prevented; phantom reads can occur. This
level prohibits a transaction from reading a row with uncommitted
changes in it, and it also prohibits the situation where one
transaction reads a row, a second transaction alters the row, and the
first transaction rereads the row, getting different values the second
time (a "non-repeatable read").
SERIALIZABLE A constant indicating that dirty reads, non-repeatable
reads and phantom reads are prevented. This level includes the
prohibitions in ISOLATION_REPEATABLE_READ and further prohibits the
situation where one transaction reads all rows that satisfy a WHERE
condition, a second transaction inserts a row that satisfies that
WHERE condition, and the first transaction rereads for the same
condition, retrieving the additional "phantom" row in the second read.
with serializable or repeatable read, the group will be protected from non-repeatable reads:
connection 1: connection 2:
set transaction isolation level
repeatable read
begin transaction
select name from users where id = 1
update user set name = 'Bill' where id = 1
select name from users where id = 1 |
commit transaction |
|--> executed here
In this scenario, the update will block until the first transaction is complete.
Higher isolation levels are rarely used because they lower the number of people that can work in the database at the same time. At the highest level, serializable, a reporting query halts any update activity.
I think you need to serialize the whole transaction. While a SELECT ... FOR UPDATE could work, it does not really buy you anything, since you would be selecting all rows. You may as well just take and release a lock, using DBMS_LOCK()

PostgreSQL vs. Oracle default transaction management

In PostgreSQL, if you encounter an error in transaction (for example when your insert statement violates unique constraint), the whole transaction is aborted, you cannot commit it and no rows are inserted:
database=# begin;
BEGIN
database=# insert into table (id, something) values ('1','whatever');
INSERT 0 1
database=# insert into table (id, something) values ('1','whatever');
ERROR: duplicate key value violates unique constraint "table_id_key"
Key (id)=(1) already exists.
database=# insert into table (id, something) values ('2','whatever');
ERROR: current transaction is aborted, commands ignored until end of transaction block
database=# rollback;
database=# select * from table;
id | something |
-----+------------+
(0 rows)
You can change that by setting ON_ERROR_ROLLBACK to "on" or "interactive", after that you can do multiple inserts ignoring errors, commit and have only successfully inserted rows in table after transaction end.
database=# \set ON_ERROR_ROLLBACK interactive
In Oracle, this is the default transaction management behaviour, which surprises me. Isn't this completely counterintuitive and dangerous?
When I start a transaction I want to be sure that all the statements were successfull. What if my multiple inserts comprise some kind of an object or data structure? I end up completely unaware of the data state in my database and should be checking it after the commit.
If one of the inserts fails I want to be sure that other inserts will be rollbacked or not even evaluated after the first error, which is exactly how it's done in PostgreSQL.
Why does Oracle have such way of transaction management as a default, and why is it considered good practice?
For example, some random guy here in comments
This is a very neat feature.
I don't understand this, though: "Normally, any error you make will
throw an exception and cause your current transaction to be marked as
aborted. This is sane and expected behavior..."
No, it's really not. Oracle doesn't work this way, nor does MySQL. I
have no experience with MSSQL or DB2 but I'll bet a dollar each they
don't work this way either. There no intuitive reason why a syntax
error, or any other error for that matter, should abort a transaction.
I can only assume there's either some limitation deep in the Postgres
guts that requires this behavior, or that it conforms to some obscure
part of the SQL standard that everyone else sensibly ignores. There's
certainly no API / UX reason why it should work this way.
We really shouldn't be too proud of any workarounds we've developed
for this pathological behavior. It's like IT Stockholm Syndrome.
Does not it violate even the definition of the transaction?
Transactions provide an "all-or-nothing" proposition, stating that
each work-unit performed in a database must either complete in its
entirety or have no effect whatsoever.
I agree with you. I think it's a mistake not to abort the whole tx. But people are used to that, so they think it's reasonable and correct. Like people who use MySQL think that the DBMS should accept 0000-00-00 as a date, or people using Oracle expect that '' IS NULL.
The idea that there's a clear distinction between a syntax error and something else is flawed.
If I write
BEGIN;
CREATE TABLE new_customers (...);
INSET INTO new_customers (...)
SELECT ... FROM customers;
DROP TABLE customers;
COMMIT;
I don't care that it's a typo resulting in a syntax error that caused me to lose my data. I care that the transaction didn't successfully execute all its statements but still committed.
It'd be technically feasible to allow soft rollback in PostgreSQL before any rows are actually written by a statement - probably before we even enter the executor. So failures in the parse and parameter binding phases could allow the tx not to be aborted. We have a statement memory context we could use to clean up.
However, once the statement starts changing rows, it's doing so on disk with the same transaction ID as the prior statements in the tx. So you can't roll it back without rolling back the whole tx. To allow statement rollback Pg needs to assign a new subtransaction ID. That costs resources. You can do it explicitly with SAVEPOINTs when you want to, and internally that's what psql is doing. In theory we could allow the server to do this implicitly for each statement to implement statement rollback, just at a performance cost. But I doubt any patch implementing this would get committed, at least not without a LOT of argument, because most of the PostgreSQL team are (IMO reasonably) not fond of "whoops, that broke but we'll continue anyway" transaction semantics.

Consecutive application threads and uncommitted data in Oracle

Our application reads a record from an Oracle 'Event' table. When the event record exists we update the 'count' field of that record. If the record doesn't exist we insert it. So we want only 1 record for a particular event in the table.
The problem with this is probably quite predictable: one application thread will read the table, see the event is not there, insert the new event and commit. But before it commits a second thread will also read the table and see the event is not there. And then both threads will insert the event and we end up with 2 records for the same event.
I guess synchronizing access to this particular method in our application will prevent this problem, but what is the best option in Oracle to prevent this? Will MERGE for example always prevent this problem?
Serialising access to the procedure that implements this functionality would be trivial to implement, using DBMS_LOCK to define and take an exclusive lock.
Serialising through SQL based methods is practically impossible, due to the read consistency model.
CREATE TABLE EVENTS (ID NUMBER PRIMARY KEY, COUNTER NUMBER NOT NULL);
MERGE INTO EVENTS
USING (SELECT ID, COUNTER FROM DUAL LEFT JOIN EVENTS ON EVENTS.ID = :EVENT_ID) SRC
ON (EVENTS.ID = SRC.ID)
WHEN MATCHED THEN UPDATE SET COUNTER = SRC.COUNTER + 1
WHEN NOT MATCHED THEN INSERT (ID, COUNTER) VALUES (:EVENT_ID, 1);
Simple SQL securing single record for each ID and consistently increasing the counter no matter what application fires it or number of concurrent thread. You don't need to code anything at all and it's very lightweight as well.
It also doesn't produce any exception related to data consistency so you don't need any special handling.
UPDATE: It actually produces unique violation exception if both threads are inserting. I thought the second merge would switch to update, but it doesn't.
UPDATE: Just tested the same case on SQL Server and when executing in parallel and the record doesn't exist one MERGE inserts and the second updates.

Do the time of the COMMIT and ROLLBACK affect performance?

Suppose I have a set of ID . For each ID , I will insert many records to many different tables based on the ID .Between inserting difference tables, different business checks will be called . If any checking fail , all the records that are inserted based on this ID will be ROLLBACK .This bulk insert action is done through using PL/SQL . Do the time of the COMMIT and ROLLBACK affect the performance and how does it affect ? For example , should I COMMIT after finish the process for one ID or COMMIT after finish all ID?
This is not so much of a performance decision but a process design decision. Do you want the other IDs to stay in the database when you have to roll back a faulty ID?
For obvious reasons, rollback takes longer when more rows must be rolled back. Rollback usually takes longer (sometimes much longer!) than the operations that have to be rolled back. Commit is always fast in Oracle, so it probably doesn't matter how often you commit in that regard.
Your problem description indicates you have a large set of smaller logical transactions (each new ID is a transaction). You should commit each logical transaction. The two reasons to wait to commit the entire set of transactions are:
If the entire set of transactions is in fact a transaction itself - all inserts must succeed for any rows to be committed. In that context, your smaller "transactions" aren't truly transactions.
You don't have a restart capability in your bulk load process, which in effect makes this a special case of item 1. If your bulk load process aborts, you need a way to skip successfully applied ID's.
Tom Kyte's advice is to commit each logical unit of work - the transaction.
Don't take the transaction time longer. make it short as possible as you can. Because according to your query some locks have been created. This locks may cause perfomance issues... so do it ID by ID...
There are two "forces" at work....
locking
during your open transaction, oracle puts locks on the changed rows.
whenever another transaction needs to update any of the locked rows,
it has to wait.
in the worst case, you can even build a deadlock.
synchronous write
every commit performs a synchronous write.
(there are ways to disable that, but it is usually the thing everybody wants: integrity).
that synchronous write can take (much) longer then the a regular write (that can be buffered).
Not to forget that there is usually an additional network round trip involved with an commit.
so, the one force says "commit as soon as possible (considering your integrity requirements)" the other says "commit as as less often as possible".
There are some other issues to consider as well, e.g. the maximum transaction size. every uncommited transaction needs some temporary space. the bigger the transaction gets, the more you need. You can also run into ORA-01555 "snapshot too old".
If there is any advice to give, then it is to implement a configurable "commit frequency" so that you can easily change it as needed.
One option if you need to control the individual sets but retain the ability to commit or rollback the entire transaction is to use savepoints. You can set a savepoint at the beginning of the outermost loop, then rollback to it if an error occurs. You might end up with something like this:
begin
--Initial batch logging
for r_record in cur_cursor loop
savepoint s_cursor loop;
begin
--Process rows
exception
when others then
rollback to s_cursor;
end;
end loop;
--Final batch logging
exception
when others then
rollback;
raise;
end;

Resources