How to make transactions concurrent - spring

So, I have a concurrent application that I am building using Scala, Akka and Spring.
I create writer actors and to each pass a chunk of data. This chunk of data belongs to 3 different classes. Hence, 3 different tables. There are parent-child relations between these 3 classes. So, the processing and insertion has to happen serially. Further there is a requirement that the whole chunk is inserted or none at all. Hence, the need for a transaction
Essentially from my writer, I call an insert method described as below.
#Transactional
insert(){
repo.save(obj1)
repo.save(obj2)
repo.batchSave(List(obj3))
}
This happens from all my writers. Without the #Transactional, the system is highly concurrent and fast. However with it, it is becoming serialized. That is,all my chunks are written one after the other, thus destroying all my concurrency.So, what am I missing if any, or is this a mandatory trade-off meaning is it not possible to have both transactions and concurrency.
Also, a very basic doubt about transactions.
Lets's say there are 2 transactions, T1 and T2
T1
begin
insert1
insert2
insert3
commit
T2
begin
insert4
insert5
insert6
commit
If I have 2 transactions as above with insert as the only operation. Will it be parallelized or will it be serialized? Is there anything like once T1 begins, will it release locks only after commit? How can this be parallelized? Because all isolation levels talk about is a read and write happening parallely and hence the case for dirty reads and READ_UNCOMMITTED.
Additional details:
Sybase relational DB
SpringJDBC jdbcTemplate for inserts
Isolation levels:Tried default and READ_UNCOMMITTED
Any guidance or ideas would be immensely helpful. Thanks

Related

What's the performance penalty of long lived DB transactions interleaved with one another?

Could anyone provide an explanation or point me to a good source where it is explained the impact of long lived database transactions when there are other transactions involved?
I'm having difficulties trying to understand what is the real impact in the performance of an application of having transactions where most of the queries are reads and maybe a couple or three are writes, given the different isolation levels.
Mostly I would like to understand it in the situation where:
Neither the rows read nor the rows updated are involved in any other transaction.
The rows read are involved in another transaction but not the rows being updated and this other transaction is read only.
The rows read are involved in another transaction but not the rows being updated and this other transaction is modifying some data being read. I understand here it also affects whether the data is read before or after is being modified.
Both the rows read and the rows updated are involved in another transaction also modifying the data.
These questions come in the context of an application using micro services where all application layer services are annotated with #Transactional using JPA and PostgreSQL and, to transform the data, they need to do some network calls to other micro services within the transaction to fetch some other values.

Dirty Reading in hibernate

Dirty Read: The definition states that
dirty reading occurs when a transaction reads data from a row that has been modified by another transaction but not yet committed.
Assuming the definition is correct, I am unable to fathom any such situation.
Due to the principle of Isolation, the transaction A can not see the uncommitted data of the row that has been modified by transaction B. If transaction B has simply not committed, how transaction A can see it in the first place? It is only possible when both operations are performed under same transaction.
Can someone please explain what am I missing here?
"Dirty", or uncommitted reads (UR) are a way to allow non-blocking reads. Reading uncommitted data is not possible in an Oracle database due to the multi-version concurrency control employed by Oracle; instead of trying to read other transactions' data each transaction gets its own snapshot of data as they existed (committed) at the start of the transaction. As a result all reads are essentially non-blocking.
In databases that use lock-based concurrency control, e.g DB2, uncommitted reads are possible. A transaction using the UR isolation level ignores locks placed by other transactions, and thus it is able to access rows that have been modified but not yet committed.
Hibernate, being an abstraction layer on top of a database, offers the UR isolation level support for databases that have the capability.

Are database/sql transaction objects safe for concurrent access?

I need to execute several SQL queries (select, update, delete) concurrently and roll back if any goroutine errors out. Thus the question: are DB transactions safe for concurrent access?
DB is safe to be accessed from multiple goroutines:
DB is a database handle representing a pool of zero or more underlying connections.
It's safe for concurrent use by multiple goroutines.
Also Stmt is safe to be used from multiple goroutines:
Stmt is a prepared statement. Stmt is safe for concurrent use by multiple goroutines.
You should use only one sql.Tx per goroutine:
Once DB.Begin is called, the returned Tx is bound to a single connection
In general yes, but you have to define the level of safety that you require. The three standard phenomena that can occur in a transaction are:
- Dirty reads (read uncommitted data)
- Nonrepeatable reads (a row is retrieved twice and the values within the row differ between reads)
- Phantom reads ( two identical queries are executed, and the collection of rows returned by the second query is different from the first)
Dependent of what behavior that is accepted you can use different isolation levels:
- Read uncommitted (all phenomena possible)
- Read committed (dirty read prevented)
- Repeatable reads (phantom read can occur)
- Serializable (non of the phenomena is possible)
In general the "higher" isolation level you use, the poorer concurrency you get. Poorer in the sense that more locks are used and blocks concurrent queries from other transactions. If you know that you shall update a row that is selected you can select ... for update.
See for example http://en.wikipedia.org/wiki/Isolation_%28database_systems%29 for a more thorough explanation.

One data store. Multiple processes. Will this SQL prevent race conditions?

I'm trying to create a Ruby script that spawns several concurrent child processes, each of which needs to access the same data store (a queue of some type) and do something with the data. The problem is that each row of data should be processed only once, and a child process has no way of knowing whether another child process might be operating on the same data at the same instant.
I haven't picked a data store yet, but I'm leaning toward PostgreSQL simply because it's what I'm used to. I've seen the following SQL fragment suggested as a way to avoid race conditions, because the UPDATE clause supposedly locks the table row before the SELECT takes place:
UPDATE jobs
SET status = 'processed'
WHERE id = (
SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
) RETURNING id, data_to_process;
But will this really work? It doesn't seem intuitive the Postgres (or any other database) could lock the table row before performing the SELECT, since the SELECT has to be executed to determine which table row needs to be locked for updating. In other words, I'm concerned that this SQL fragment won't really prevent two separate processes from select and operating on the same table row.
Am I being paranoid? And are there better options than traditional RDBMSs to handle concurrency situations like this?
As you said, use a queue. The standard solution for this in PostgreSQL is PgQ. It has all these concurrency problems worked out for you.
Do you really want many concurrent child processes that must operate serially on a single data store? I suggest that you create one writer process who has sole access to the database (whatever you use) and accepts requests from the other processes to do the database operations you want. Then do the appropriate queue management in that thread rather than making your database do it, and you are assured that only one process accesses the database at any time.
The situation you are describing is called "Non-repeatable read". There are two ways to solve this.
The preferred way would be to set the transaction isolation level to at least REPEATABLE READ. This will mean that any row that concurrent updates of the nature you described will fail. if two processes update the same rows in overlapping transactions one of them will be canceled, its changes ignored, and will return an error. That transaction will have to be retried. This is achieved by calling
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
At the start of the transaction. I can't seem to find documentation that explains an idiomatic way of doing this for ruby; you may have to emit that sql explicitly.
The other option is to manage the locking of tables explicitly, which can cause a transaction to block (and possibly deadlock) until the table is free. Transactions won't fail in the same way as they do above, but contention will be much higher, and so I won't describe the details.
That's pretty close to the approach I took when I wrote pg_message_queue, which is a simple queue implementation for PostgreSQL. Unlike PgQ, it requires no components outside of PostgreSQL to use.
It will work just fine. MVCC will come to the rescue.

Achieving ACID properties using JDBC?

First of all i would like to confirm is it the responsibility of developer to follow these properties or responsibilty of transaction Apis like JDBC?
Below is my understanding how we achieve acid properties in JDBC
Atomicity:- as there is one transaction associated with connection, so we do commit or rollback , there are no partial updation.Hence achieved
Consitency:- when some data integrity constraint is voilated (say some check constraint) then sqlexception will be thrown . Then programmer acieve the consistent database by rollbacking the transaction?
one question on above say we do transaction1 and sql excpetion is thrown during transaction 2 as explained above . Now we catch the exception and do the commit will first transaction be commited?
Isolation:- Provided by JDBC Apis.But this leads to the problem of concurrent update . so it has be dealt manually right?
Durability:- Provided by JDBC Apis.
Please let me if above understanding is right?
ACID principles of transactional integrity are implemented by the database not by the API (like JDBC) or by the application. Your application's responsibility is to choose a database and a database configuration that supports whatever transactional integrity you need and to correctly identify the transactional boundaries in your application.
When an exception is thrown, your application has to determine whether it is appropriate to rollback the entire transaction or to proceed with additional processing. It may be appropriate if your application is processing orders from a vendor, for example, to process the 99 orders that succeed and log the 1 order that failed somewhere for users to investigate. On the other hand, you may reject all 100 orders because 1 failed. It depends what your application is doing.
In general, you only have one transaction open at a time (or, more accurately, one transaction per connection). So if you are working in transaction 2, transaction 1 by definition has already completed-- it was either committed or rolled back previously. Exceptions thrown in transaction 2 have no impact on transaction 1.
Depending on the transaction isolation level your application requests (and the transaction isolation levels your database supports) as well as the mechanics of your application, lost updates are something that you may need to be concerned about. If you set your transaction isolation level to read committed, it is possible that you would read a value as 'A' in transaction 1, wait for a user to do something, update the value to 'B', and commit without realizing that transaction 2 updated the value to 'C' between the time you read the data and the time you wrote the data. This may be a problem that you need to deal with or it may be something where it is fine for the last person to update a row to "win".
Your database, on the other hand, should take care of the automatic locking that prevents two transactions from simultaneously updating the same row of the same table. It may do this by locking more than is strictly necessary but it will serialize the updates somehow.

Resources