Many paralllel transactions vs transaction isolation level in Oracle DB - oracle

My app retrieves requests using Weblogic Server, EJB, EclipseLink as a ORM.
The business logic looks as follows.
Select all records from table A where A.col = 'ABC' (ABS is a value from request)
If none of them satisfy some condition, create new one.
Now suppose that 10 parallel requests have been sent with the same payload.
I have a default isolation level in my Oracle DB (READ_COMMITED). In this case, many transactions are performed in parallel:
Req1 start T1
Req2 start T2
T1 select rows
T2 select rows
T1 insert new one (no rows with col = 'ABC')
T1 COMMIT
T2 insert new one (no rows with col = 'ABC')
T2 COMMIT
As a result, 1-10 rows are created instead of 1.
Oracle doesn't have REPEATABLE_READS isolation level. SERIALIZABLE has a negative impact to throughput.

PESSIMISTIC_WRITE lock mode is the solution.
PESSIMISTIC_WRITE acquire exclusive lock on selected row using FOR UPDATE (Oracle). JPA takes LockModeType as a one of method argument, e.g. in find method.
Of course, consistency at the expense of throughput.

Related

JpaItemWriter<T> stills performs writes one item at a time instead of in batch

I have a question about writing operations in Spring Batch on databases through the ItemWriter<T> contract. To quote from The Definitive Guide to Spring Batch by Michael T. Minella:
All of the items are passed in a single call to the ItemWriter where they can be written out at once. This single call to the ItemWriter allows for IO optimizations by batching the physical write. [...] Chunks are defined by their commit intervals. If the commit interval is set to 50 items, then your job reads in 50 items, processes 50 items, and then writes out 50 items at once.
Yet when I use, say, HibernateItemWriter or JpaItemWriter in a step-based job to write to the database in a Spring-Boot-based app with all the Spring Batch infrastructure in place (#EnableBatchProcessing, Step/JobBuilderFactory, etc.) together with monitoring tools for verifying the number of insert/update statements like implementations of the MethodInterceptor interface, I notice that the number of inserts performed by the writer is equal to the total size of records to process instead of the number of chunks set for that job.
For example, upon inspection of the logs in Intellij from a job execution of 10 items with a chunk size of 5, I found 10 insert statements
Query:["insert into my_table (fields...
instead of 2. I also checked for insert statements in the general_log_file for my RDS instance and found two 'Prepare insert' statements and one 'Execute insert' statement for each item to process.
Now I understand that a writer such as JpaItemWriter<T>'s method write(List<? extends T> items) loops through the items calling entityManager.persist/merge(item) - thereby inserting a new row into the corresponding table - and eventually entityManager.flush(). But where is the performance gain provided by the batch processing, if there is any?
where is the performance gain provided by the batch processing, if there is any?
There is performance gain, and this gain is provided by the chunk-oriented processing model that Spring Batch offers in the sense that it will execute all these insert statements in a single transaction:
start transaction
INSERT INTO table ... VALUES ...
INSERT INTO table ... VALUES ...
...
INSERT INTO table ... VALUES ...
end transaction
You would see a performance hit if there was a transaction for each item, something like:
start transaction
INSERT INTO table ... VALUES ...
end transaction
start transaction
INSERT INTO table ... VALUES ...
end transaction
...
But that is not the case with Spring Batch, unless you set the chunk-size to 1 (but that defeats the goal of using such a processing model in the first place).
So yes, even if you see multiple insert statements, that does not mean that there are no batch inserts. Check the transaction boundaries in your DB logs and you should see a transaction around each chunk, not around each item.
As a side note, from my experience, using raw JDBC performs better than JPA (with any provider) when dealing with large inserts/updates.
Performance can be improved by batching inserts with the following configuration
spring.jpa.properties.hibernate.jdbc.batch_size=?
For example with a batch_size of 3 and a chunk size of 3 when a chunk is committed it will execute the following SQL
INSERT INTO my_table (id, name)
VALUES (1, 'Pete'), (2, 'Pam'), (3, 'Paul');
rather than multiple single inserts
INSERT INTO my_table (id, name) VALUES (1, 'Pete');
INSERT INTO my_table (id, name) VALUES (2, 'Pam');
INSERT INTO my_table (id, name) VALUES (3, 'Paul');
The following blog hightlights it's use:
https://vladmihalcea.com/the-best-way-to-do-batch-processing-with-jpa-and-hibernate/

Switching Views between two tables of the same schema in runtime in Oracle

My client has two tables that have identical columns, I'll call them T1 and T2.
One view, V points to T1, while some batch process works on T2. T2 has to be truncated and data reloaded afresh. T1 had the latest data before the new reloading on T2
When the batch is completed the View is replaced with V pointing to the T2 table. This switch takes place back and forth once a day.
My questions are:
When create or replace VIEW V for T2; is done switching V to point to T2 instead of T1 and concurrent queries to Oracle accessing V keeps coming in either by using SQL or a stored procedure, concurrently with modification of VIEW V , will there be a point where at the time of switching a query may fail.
Is there a better design where instead of VIEW V switching between tables the data can be reloaded as well as read at the same time.
We have a similar process in my company. This is handled by using synonyms rather than a view, for example:
create or replace synonym tab_syn for user_tables; -- syn set-up for the process
select count(*) from tab_syn; -- run your process
create or replace synonym tab_syn for all_tables; -- syn next run
select count(*) from tab_syn; -- next run

Atomic read and write in one query in Oracle

I have an app that requires a single instance of a task to be run. In order to check whether a current instance of the task is already running, i check the status of the task. If the task has a combination of one or more of those statuses, then it knows the task is already running and should skip the task for now. These tasks can be called from multiple places so i could have a hundred or so calls for the task to be run in a minute.
I have the following query on Oracle 11g.
The SQL:
INSERT INTO Log2 (LogID, Action, EventDateTime)
SELECT 102211, 2, SYSDATE FROM dual WHERE NOT EXISTS
(SELECT LogID FROM Log2 T3 WHERE T3.Param2 = 102 AND T3.Action = 34 AND T3.AuditLogID NOT IN
(SELECT T2.LogID FROM Log2 T1, Log2 T2 WHERE (T1.Action IN (1,2,3) AND T2.Action = 6 AND T1.Param3=T2.Param3 AND T1.Param2 = 102))
);
At the moment the above query will sometimes allow 2 records to be inserted at the same time (eventDateTime tells me that).How can i ensure that this does not happen ? Do i need to add some locking hints ? I thought the whole query was atomic. The JDBC connection is on auto commit.
There are several parts of the app that update this table. i only want this locking to happens for the tasks item as stated above. Other parts of the app that add records to this Log2 table only ever insert one record at a time, so this single instance behaviour is not required for those other parts.
Thanks

Write Skew anomaly in Oracle and PostgreSQL does not rollback transaction

I noticed the following occurrence in both Oracle and PostgreSQL.
Considering we have the following database schema:
create table post (
id int8 not null,
title varchar(255),
version int4 not null,
primary key (id));
create table post_comment (
id int8 not null,
review varchar(255),
version int4 not null,
post_id int8,
primary key (id));
alter table post_comment
add constraint FKna4y825fdc5hw8aow65ijexm0
foreign key (post_id) references post;
With the following data:
insert into post (title, version, id) values ('Transactions', 0, 1);
insert into post_comment (post_id, review, version, id)
values (1, 'Post comment 1', 459, 0);
insert into post_comment (post_id, review, version, id)
values (1, 'Post comment 2', 537, 1);
insert into post_comment (post_id, review, version, id)
values (1, 'Post comment 3', 689, 2);
If I open two separate SQL consoles and execute the following statements:
TX1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
TX2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
TX1: SELECT COUNT(*) FROM post_comment where post_id = 1;
TX1: > 3
TX1: UPDATE post_comment SET version = 100 WHERE post_id = 1;
TX2: INSERT INTO post_comment (post_id, review, version, id) VALUES (1, 'Phantom', 0, 1000);
TX2: COMMIT;
TX1: SELECT COUNT(*) FROM post_comment where post_id = 1;
TX1: > 3
TX1: COMMIT;
TX3: SELECT * from post_comment;
> 0;"Post comment 0";100;1
1;"Post comment 1";100;1
2;"Post comment 2";100;1
1000;"Phantom";0;1
As expected, the SERIALIZABLE isolation level has kept the snapshot data from the beginning of the TX1 transaction and TX1 only sees 3 post_comment records.
Because of the MVCC model in Oracle and PostgreSQL, TX2 is allowed to insert a new record and commit.
Why is TX1 allowed to commit? Because this is a Write Skew anomaly, I was expecting to see that TX1 would be rolled back with a "Serialization failure exception" or something similar.
Does the MVCC Serializable model in PostgreSQL and Oracle only offer a snapshot isolation guarantee but no Write Skew anomaly detection?
UPDATE
I even changed Tx1 to issue an UPDATE statement that changes the version column for all post_comment records belonging to the same post.
This way, Tx2 creates a new record and Tx1 is going to commit without knowing that a new record has been added that satisfied the UPDATE filtering criteria.
Actually, the only way to make it fail on PostgreSQL is if we execute the following COUNT query in Tx2, prior to inserting the phantom record:
Tx2: SELECT COUNT(*) FROM post_comment where post_id = 1 and version = 0
TX2: INSERT INTO post_comment (post_id, review, version, id) VALUES (1, 'Phantom', 0, 1000);
TX2: COMMIT;
Then Tx1 is going to be rolled back with:
org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions
Detail: Reason code: Canceled on identification as a pivot, during conflict out checking.
Hint: The transaction might succeed if retried.
Most likely that the write-skew anomaly prevention mechanism detected this change and rolled back the transaction.
Interesting that Oracle does not seem to be bothered by this anomaly and so Tx1 just commits successfully. Since Oracle does not prevent write-skew from happening, Tx1 commits juts fine.
By the way, you can run all these examples yourself since they are on GitHub.
In the 1995 paper, A Critique of ANSI SQL Isolation Levels, Jim Gray and co, described Phantom Read as:
P3: r1[P]...w2[y in P]...(c1 or a1) (Phantom)
One important note is that ANSI SQL P3 only prohibits inserts (and
updates, according to some interpretations) to a predicate whereas the
definition of P3 above prohibits any write satisfying the predicate
once the predicate has been read — the write could be an insert,
update, or delete.
Therefore, a Phantom Read does not mean that you can simply return a snapshot as of the start of the currently running transaction and pretend that providing the same result for a query is going to protect you against the actual Phantom Read anomaly.
In the original SQL Server 2PL (Two-Phase Locking) implementation, returning the same result for a query implied Predicate Locks.
The MVCC (Multi-Version Concurrency Control) Snapshot Isolation (wrongly named Serializable in Oracle) does not actually prevent other transactions from inserting/deleting rows that match the same filtering criteria with a query that already executed and returned a result set in our current running transaction.
For this reason, we can imagine the following scenario in which we want to apply a raise to all employees:
Tx1: SELECT SUM(salary) FROM employee where company_id = 1;
Tx2: INSERT INTO employee (id, name, company_id, salary)
VALUES (100, 'John Doe', 1, 100000);
Tx1: UPDATE employee SET salary = salary * 1.1;
Tx2: COMMIT;
Tx1: COMMIT:
In this scenario, the CEO runs the first transaction (Tx1), so:
She first checks the sum of all salaries in her company.
Meanwhile, the HR department runs the second transaction (Tx2) as they have just managed to hire John Doe and gave him a 100k $ salary.
The CEO decides that a 10% raise is feasible taking into account the total sum of salaries, being unaware that the salary sum has raised with 100k.
Meanwhile, the HR transaction Tx2 is committed.
Tx1 is committed.
Boom! The CEO has taken a decision on an old snapshot, giving a raise that might not be sustained by the current updated salary budget.
You can view a detailed explanation of this use case (with lots of diagrams) in the following post.
Is this a Phantom Read or a Write Skew?
According to Jim Gray and co, this is a Phantom Read since the Write Skew is defined as:
A5B Write Skew Suppose T1 reads x and y, which are consistent with
C(), and then a T2 reads x and y, writes x, and commits. Then T1
writes y. If there were a constraint between x and y, it might be
violated. In terms of histories:
A5B: r1[x]...r2[y]...w1[y]...w2[x]...(c1 and c2 occur)
In Oracle, the Transaction Manager might or might not detect the anomaly above because it does not use predicate locks or index range locks (next-key locks), like MySQL.
PostgreSQL manages to catch this anomaly only if Bob issues a read against the employee table, otherwise, the phenomenon is not prevented.
UPDATE
Initially, I was assuming that Serializability would imply a time ordering as well. However, as very well explained by Peter Bailis, wall-clock ordering or Linearizability is only assumed for Strict Serializability.
Therefore, my assumptions were made for a Strict Serializable system. But that's not what Serializable is supposed to offer. The Serializable isolation model makes no guarantees about time, and operations are allowed to be reordered as long as they are equivalent to a some serial execution.
Therefore, according to the Serializable definition, such a Phantom Read can occur if the second transaction does not issue any read. But, in a Strict Serializable model, the one offered by 2PL, the Phantom Read would be prevented even if the second transaction does not issue a read against the same entries which we are trying to guard against phantom reads.
What you observe is not a phantom read. That would be if a new row would show up when the query is issued the second time (phantoms appear unexpectedly).
You are protected from phantom reads in both Oracle and PostgreSQL with SERIALIZABLE isolation.
The difference between Oracle and PostgreSQL is that SERIALIZABLE isolation level in Oracle only offers snapshot isolation (which is good enough to keep phantoms from appearing), while in PostgreSQL it will guarantee true serializability (i.e., there always exists a serialization of the SQL statements that leads to the same results). If you want to get the same thing in Oracle and PostgreSQL, use REPEATABLE READ isolation in PostgreSQL.
I just wanted to point that Vlad Mihalcea's answer is plain wrong.
Is this a Phantom Read or a Write Skew?
Neither of those -- there is no anomaly here, transactions are serializable as Tx1 -> Tx2.
SQL standard states:
"A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions."
PostgreSQL manages to catch this anomaly only if Bob issues a read against the employee table, otherwise the phenomenon is not prevented.
PostgreSQL's behavior here is 100% correct, it just "flips" apparent transactions order.
The Postgres documentation defines a phantom read as:
A transaction re-executes a query returning a set of rows that satisfy
a search condition and finds that the set of rows satisfying the
condition has changed due to another recently-committed transaction.
Because your select returns the same value both before and after the other transaction committed, it does not meet the criteria for a phantom read.

how INSERT works before issuing a COMMIT in Oracle

My question is how oracle treats an INSERT transaction before issuing a COMMIT.
While I am doing an INSERT transaction, will oracle wait until I have inserted all my records within that procedure and then when I issue a COMMIT statement will the records be saved in a sequence for this transaction?
In the following code, the first insert that is made is the number of rows (metadata) and then the cursor loops and starts inserting the actual data.
Is there a possibility, in one transaction when I call this procedure, first my metadata record is inserted and then some other data (not related to this transaction) be inserted and then rest of my data. So that, the first record and the rest of the records from the loop are not inserted in a Sequence.
-- This code belongs to proecdure when ever a user clicks on insert
-- button from the front end form
DECLARE
rowcnt NUMBER;
CURSOR c_get_employ IS
SELECT EMP.EMPLOYER_ID, EMP.EMPLOYER_NAME, EMP.EMPLOYER_LOCATION
FROM EMP
WHERE EMP.EMPLOYER_COUNTRY = 'USA'
ORDER BY EMP.EMPLOYER_ID;
BEGIN
Select count(*)
INTO rowcnt
FROM EMP
WHERE EMP.EMPLOYER_COUNTRY = 'USA'
ORDER BY EMP.EMPLOYER_ID;
-- I want to insert the 'number of employee records' that will be inserted (metadata)
INSERT INTO EMP_OUTPUT
(EMPID, EMPNAME, EMPLOC, ECOUNT)
VALUES
(,,,rowcnt);
-- Then loop through the cursor and start inserting the data
FOR c_post_employ IN c_get_employ LOOP
INSERT INTO EMP_OUTPUT
(EMPID, EMPNAME, EMPLOC)
VALUES
(c_post_employ.EMPLOYER_ID,c_post_employ.EMPLOYER_NAME,c_post_employ.EMPLOYER_LOCATION);
END LOOP;
COMMIT;
END;
Another transaction can perform inserts concurrently to your transaction, but your transaction won't see them:
until the other transaction commits (if your transaction is using READ COMMITTED isolation), or
ever (when using SERIALIZABLE isolation) - you'll need to start another transaction to see them.
Whether this will yield a correct behavior, is for you to decide.
Just be careful about SELECT COUNT(*) ... - it may not return what you expect. Consider the following scenario:
The EMP table is initially empty.
Transaction A starts and inserts a row in EMP, but does not commit.
Transaction B starts and inserts a row in EMP, but does not commit.
Transaction A executes SELECT COUNT(*) FROM EMP and gets 1 (because it sees its own newly inserted row, but does not see B's newly inserted row since B did not commit yet).
Transaction B executes SELECT COUNT(*) FROM EMP and also gets 1 (for the same reason but in reverse).
Transaction A inserts 1 into EMP_OUTPUT and commits.
Transaction B inserts 1 into EMP_OUTPUT and commits (assuming there is no key violation).
So, 1 is inserted despite table actually having 2 rows!
Unfortunately not even Oracle's SERIALIZABLE isolation will save you from this kind of anomaly. Pretty much the only way to guarantee the "correct" result if to lock the entire table, so no concurrent inserts (or deletes) can occur.
Use a single SQL statement if possible. It will have statement-level read consistency, and will be much faster.
insert into emp_output(empid, empname, emploc, ecount)
with employees as
(
select employer_id, employee_name, employer_location
from emp
where employer_country = 'USA'
order by employer_id
)
select null, null, null, count(*) from employees
union all
select employer_id, employee_name, employer_location, null from employees;
The term you want to google for is "read consistency":
http://docs.oracle.com/cd/B12037_01/server.101/b10743/consist.htm
Bottom line:
As you know, if you rollback, it's as though the inserts "never happened"
However, other stuff can (and probably did) "happen" in the meantime.
You need to run in the Serializable Isolation Level:
http://docs.oracle.com/cd/E11882_01/server.112/e16508/consist.htm#BABCJIDI
"Serializable transactions see only those changes that were committed at the time the transaction began, plus those changes made by the transaction itself through INSERT, UPDATE, and DELETE statements. Serializable transactions do not experience nonrepeatable reads or phantoms."

Resources