Are database/sql transaction objects safe for concurrent access? - go

I need to execute several SQL queries (select, update, delete) concurrently and roll back if any goroutine errors out. Thus the question: are DB transactions safe for concurrent access?

DB is safe to be accessed from multiple goroutines:
DB is a database handle representing a pool of zero or more underlying connections.
It's safe for concurrent use by multiple goroutines.
Also Stmt is safe to be used from multiple goroutines:
Stmt is a prepared statement. Stmt is safe for concurrent use by multiple goroutines.
You should use only one sql.Tx per goroutine:
Once DB.Begin is called, the returned Tx is bound to a single connection

In general yes, but you have to define the level of safety that you require. The three standard phenomena that can occur in a transaction are:
- Dirty reads (read uncommitted data)
- Nonrepeatable reads (a row is retrieved twice and the values within the row differ between reads)
- Phantom reads ( two identical queries are executed, and the collection of rows returned by the second query is different from the first)
Dependent of what behavior that is accepted you can use different isolation levels:
- Read uncommitted (all phenomena possible)
- Read committed (dirty read prevented)
- Repeatable reads (phantom read can occur)
- Serializable (non of the phenomena is possible)
In general the "higher" isolation level you use, the poorer concurrency you get. Poorer in the sense that more locks are used and blocks concurrent queries from other transactions. If you know that you shall update a row that is selected you can select ... for update.
See for example http://en.wikipedia.org/wiki/Isolation_%28database_systems%29 for a more thorough explanation.

Related

How to make transactions concurrent

So, I have a concurrent application that I am building using Scala, Akka and Spring.
I create writer actors and to each pass a chunk of data. This chunk of data belongs to 3 different classes. Hence, 3 different tables. There are parent-child relations between these 3 classes. So, the processing and insertion has to happen serially. Further there is a requirement that the whole chunk is inserted or none at all. Hence, the need for a transaction
Essentially from my writer, I call an insert method described as below.
#Transactional
insert(){
repo.save(obj1)
repo.save(obj2)
repo.batchSave(List(obj3))
}
This happens from all my writers. Without the #Transactional, the system is highly concurrent and fast. However with it, it is becoming serialized. That is,all my chunks are written one after the other, thus destroying all my concurrency.So, what am I missing if any, or is this a mandatory trade-off meaning is it not possible to have both transactions and concurrency.
Also, a very basic doubt about transactions.
Lets's say there are 2 transactions, T1 and T2
T1
begin
insert1
insert2
insert3
commit
T2
begin
insert4
insert5
insert6
commit
If I have 2 transactions as above with insert as the only operation. Will it be parallelized or will it be serialized? Is there anything like once T1 begins, will it release locks only after commit? How can this be parallelized? Because all isolation levels talk about is a read and write happening parallely and hence the case for dirty reads and READ_UNCOMMITTED.
Additional details:
Sybase relational DB
SpringJDBC jdbcTemplate for inserts
Isolation levels:Tried default and READ_UNCOMMITTED
Any guidance or ideas would be immensely helpful. Thanks

Dirty Reading in hibernate

Dirty Read: The definition states that
dirty reading occurs when a transaction reads data from a row that has been modified by another transaction but not yet committed.
Assuming the definition is correct, I am unable to fathom any such situation.
Due to the principle of Isolation, the transaction A can not see the uncommitted data of the row that has been modified by transaction B. If transaction B has simply not committed, how transaction A can see it in the first place? It is only possible when both operations are performed under same transaction.
Can someone please explain what am I missing here?
"Dirty", or uncommitted reads (UR) are a way to allow non-blocking reads. Reading uncommitted data is not possible in an Oracle database due to the multi-version concurrency control employed by Oracle; instead of trying to read other transactions' data each transaction gets its own snapshot of data as they existed (committed) at the start of the transaction. As a result all reads are essentially non-blocking.
In databases that use lock-based concurrency control, e.g DB2, uncommitted reads are possible. A transaction using the UR isolation level ignores locks placed by other transactions, and thus it is able to access rows that have been modified but not yet committed.
Hibernate, being an abstraction layer on top of a database, offers the UR isolation level support for databases that have the capability.

Which Isolation level should i use for booking flight

I have a flight reservation program use mssql
,For reserving flights i want to be sure should i use isolation level or locks?
(this is a sample code,my problem is Isolation Level for this situation not do the reservation)
My Database has a table for inventory like:
Inventory Table
------------------------
id (Pk),
FlightNumber,
Total,
Sold
now if some want to reserve a flight,i use this code in transaction
Decalre #total int;
Decalre #sold int;
Select #total=Total,#sold=Sold From Inventory where FlightNumber='F3241b';
IF #total-#sold > 0
BEGIN
Update inventory set Sold=Sold+1 where FlightNumber='F3241b';
PRINT 'Reserve Complete'
END
ELSE
PRINT 'this flight is full'
i have these Question:
Q1: Should I use Locks or Isolation Levels?does it have any benefit for perfomance to use one?
Q2: according to Q1 Which Isolation Level or Lock should i use
If you're looking to see what isolation level will make the sample code work as it stands, rather than what is the best way to solve the problem addressed by the sample code, you would need the guarantees of at least REPEATABLE READ.
Databases which use strict two-phase locking (S2PL) for concurrency allow READ COMMITTED transactions to drop shared locks at the completion of each statement, or even earlier, so between the time transaction A checks availability and the time it claims the seats, someone else could come through with transaction B and read again, without causing either transaction to fail. Transaction A might block transaction B briefly, but both would update, and you could be over-sold.
In databases using multi-version concurrency control (MVCC) for concurrency, reads don't block writes and writes don't block reads. At READ COMMITTED, each statement uses a new snapshot of the database based on what has committed, and in at least some (I know this is true in PostgreSQL), concurrent writes are resolved without error. So even if transaction A was in the process of updating the sold count, or had done so and not committed, transaction B would see the old count and proceed to update. When it attempted the update, it could block waiting for the previous update, but once that committed, it would find the new version of the row, check whether it meets the selection criteria, update if it does and ignore the row if not, and proceed to commit without error. So, again, you are over-sold.
I guess that answers Q2, if you choose to use transaction isolation. The problem can be solved at a lower isolation level by modifying the example code to take explicit locks, but that will usually cause more blocking that using an isolation level which is strict enough to handle it automatically.
You are overly complicating things. All your queries can be replaced with:
Update inventory
set Sold = Sold + 1
where FlightNumber = 'F3241b'
AND Total - Sold > 0 -- Important!
If the flight is full, the UPDATE won't take place (the second condition is not met) and it will return 0 modified rows. If this is the case it means the flight is full. Otherwise the query modifies the Sold value and returns 1 modified row.
In this case any isolation level is fine because a single query is always atomic. This is somewhat similar to optimistic-locking.
BTW this query can be easily tuned to allow arbitrary number of reservations to be made atomically:
Update inventory
set Sold = Sold + #seats
where FlightNumber = 'F3241b'
AND Total - Sold >= #seats
See this link that explains SNAPSHOT ISOLATION level in SQL Server.
http://msdn.microsoft.com/en-us/library/ms345124(v=sql.90).aspx
They talk about a car rental application.
If you need a more restrictive isolation level you could move to IsolationLevel Serializable. But be warned that this is prone to locking and might affect your performance.

One data store. Multiple processes. Will this SQL prevent race conditions?

I'm trying to create a Ruby script that spawns several concurrent child processes, each of which needs to access the same data store (a queue of some type) and do something with the data. The problem is that each row of data should be processed only once, and a child process has no way of knowing whether another child process might be operating on the same data at the same instant.
I haven't picked a data store yet, but I'm leaning toward PostgreSQL simply because it's what I'm used to. I've seen the following SQL fragment suggested as a way to avoid race conditions, because the UPDATE clause supposedly locks the table row before the SELECT takes place:
UPDATE jobs
SET status = 'processed'
WHERE id = (
SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
) RETURNING id, data_to_process;
But will this really work? It doesn't seem intuitive the Postgres (or any other database) could lock the table row before performing the SELECT, since the SELECT has to be executed to determine which table row needs to be locked for updating. In other words, I'm concerned that this SQL fragment won't really prevent two separate processes from select and operating on the same table row.
Am I being paranoid? And are there better options than traditional RDBMSs to handle concurrency situations like this?
As you said, use a queue. The standard solution for this in PostgreSQL is PgQ. It has all these concurrency problems worked out for you.
Do you really want many concurrent child processes that must operate serially on a single data store? I suggest that you create one writer process who has sole access to the database (whatever you use) and accepts requests from the other processes to do the database operations you want. Then do the appropriate queue management in that thread rather than making your database do it, and you are assured that only one process accesses the database at any time.
The situation you are describing is called "Non-repeatable read". There are two ways to solve this.
The preferred way would be to set the transaction isolation level to at least REPEATABLE READ. This will mean that any row that concurrent updates of the nature you described will fail. if two processes update the same rows in overlapping transactions one of them will be canceled, its changes ignored, and will return an error. That transaction will have to be retried. This is achieved by calling
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
At the start of the transaction. I can't seem to find documentation that explains an idiomatic way of doing this for ruby; you may have to emit that sql explicitly.
The other option is to manage the locking of tables explicitly, which can cause a transaction to block (and possibly deadlock) until the table is free. Transactions won't fail in the same way as they do above, but contention will be much higher, and so I won't describe the details.
That's pretty close to the approach I took when I wrote pg_message_queue, which is a simple queue implementation for PostgreSQL. Unlike PgQ, it requires no components outside of PostgreSQL to use.
It will work just fine. MVCC will come to the rescue.

Dead lock is happening same data base record updating in multiple connection sessions concurrently

We have implemented client server socket based application to process multiple shopping cart requests. Daily we receive thousands of shopping cart requests.
For this we implemented multi threaded architecture to process requests concurrently. We are using Oracle Connection Pool for data base operations and we set optimal value for connection pool size. As per our business process we have a main database table and we need to update same set of rows by multiple threads using multiple connection sessions concurrently. Now are getting some dead lock issues because of multiple threads will try to update the data on same rows using multiple connection sessions concurrently and also we are some other primary key violations on tables. Sometimes data base is also getting locked by inserting same data in multiple connection sessions concurrently.
Please suggest me good approach to handle above problems immediately.
There are a few different general solutions to writing multithreaded code that does not encounter deadlocks. The simplest is to ensure that you always lock resources in the same order.
A deadlock occurs when one session holds a lock on A and wants a lock on B while another session holds a lock on B and wants a lock on A. If you ensure that your code always locks A before B (or B before A), you can be guaranteed that you won't have a deadlock.
As for your comment about primary key violations, are you using something other than an Oracle sequence to generate your primary keys? If so, that is almost certainly the problem. Oracle sequences are explicitly designed to provide unique primary keys in the case where you have multiple sessions doing simultaneous inserts.

Resources