First of all i would like to confirm is it the responsibility of developer to follow these properties or responsibilty of transaction Apis like JDBC?
Below is my understanding how we achieve acid properties in JDBC
Atomicity:- as there is one transaction associated with connection, so we do commit or rollback , there are no partial updation.Hence achieved
Consitency:- when some data integrity constraint is voilated (say some check constraint) then sqlexception will be thrown . Then programmer acieve the consistent database by rollbacking the transaction?
one question on above say we do transaction1 and sql excpetion is thrown during transaction 2 as explained above . Now we catch the exception and do the commit will first transaction be commited?
Isolation:- Provided by JDBC Apis.But this leads to the problem of concurrent update . so it has be dealt manually right?
Durability:- Provided by JDBC Apis.
Please let me if above understanding is right?
ACID principles of transactional integrity are implemented by the database not by the API (like JDBC) or by the application. Your application's responsibility is to choose a database and a database configuration that supports whatever transactional integrity you need and to correctly identify the transactional boundaries in your application.
When an exception is thrown, your application has to determine whether it is appropriate to rollback the entire transaction or to proceed with additional processing. It may be appropriate if your application is processing orders from a vendor, for example, to process the 99 orders that succeed and log the 1 order that failed somewhere for users to investigate. On the other hand, you may reject all 100 orders because 1 failed. It depends what your application is doing.
In general, you only have one transaction open at a time (or, more accurately, one transaction per connection). So if you are working in transaction 2, transaction 1 by definition has already completed-- it was either committed or rolled back previously. Exceptions thrown in transaction 2 have no impact on transaction 1.
Depending on the transaction isolation level your application requests (and the transaction isolation levels your database supports) as well as the mechanics of your application, lost updates are something that you may need to be concerned about. If you set your transaction isolation level to read committed, it is possible that you would read a value as 'A' in transaction 1, wait for a user to do something, update the value to 'B', and commit without realizing that transaction 2 updated the value to 'C' between the time you read the data and the time you wrote the data. This may be a problem that you need to deal with or it may be something where it is fine for the last person to update a row to "win".
Your database, on the other hand, should take care of the automatic locking that prevents two transactions from simultaneously updating the same row of the same table. It may do this by locking more than is strictly necessary but it will serialize the updates somehow.
Related
I have been reading about microservices and distributed transactions. Most articles talk about 2 Phase commit or Saga pattern, but does not go into detail on how an object is locked, so that other can't access that data when the transaction has not completed.
if I have a customer service and an order service and I initiate a request to lock customers funds till the order has been processed. In a distributed system, how is this achieved.
In DB's is it possible to explicitly lock a row and then another request goes and unlocks the row or is this achieved using a locked field on the customers table that the first transaction sets it to locked and once the order is complete, it goes back and sets it to unlocked or empties that row?
If there are some examples with code samples that will be great
Most articles talk about 2 Phase commit or Saga pattern, but does not
go into detail on how an object is locked, so that other can't access
that data when the transaction has not completed.
The 2PC is defined as blocking. That means that if transaction manager, which manages the 2PC transaction, is down the 2PC can't be resolved. The transaction manager is then a single point of failure.
If you are ensured that a failed transaction manager is restarted then even the 2PC protocol is said to be blocking you are the assurance that transaction manager will be available and the resolution won't be blocked.
Then 2PC uses locks. They are required as a fundamental element of the protocol. Transaction manager communicates with participants - resources. The participant is the database. When the 2PC starts running then the call of prepare means that the database makes a persistent locks on all rows that participated in the transaction. This lock is released when transaction manager calls commit.
It's important to understand that the transaction before the 2PC is in-flight (not persistent). It's stored in-memory. After the prepare is called the transaction state is stored persistently, until commit is called (and at that time the protocol may be blocked by unavailable transaction manager - the lock is persistent and the system waits for the transaction manager to do release it).
That's about locking from the 2PC perspective. But there are transaction locks from database perspective.
When you update a row in database then the transaction is in-flight (stored in memory). At that time the database needs to ensure that concurrent updates won't corrupt your data. One way is to lock the row and do not permit the concurrent updates.
But, the most databases do not lock the row - by default, in dependence to isolation level - in these cases as they use snapshot isolation (MVCC, https://en.wikipedia.org/wiki/Snapshot_isolation). That particularly means that the row is locked optimistically and the database permits other transactions to update the row.
But! the 2PC prepare can't be processed optimistically. When the database replies 'OK' to prepare request from the transaction manager the row is just locked.
Plus, you can't manage this locking by hand. If you try to do so you ruin the 2PC guarantee of consistency.
As in your example there is a customer service and an order service. When the 2PC transaction spans over the both services. Then customer updates database and order service updates the database as well. There is still running in-flight transactions in the database. Then request finishes and transaction manager commands the in-flight transaction to commit. It runs the 2PC. It invokes prepare on the customer service db transaction, then on the order service transaction and then it calls to commit.
If you use the saga pattern then saga is span over the both services. From the transaction perspective the customer service creates a database in-flight transaction and it commits it immediately. Then the call goes to the order service where the same happens too. When the request finishes the saga checks that everything run fine. When a failure happened a compensate callback is called.
The failure is "the trouble" from the perspective of ease of use. For saga you need to maintain the failure resolution on your own in the callback method. For 2PC the failure resolution is processed automatically by rollback call.
A note: I tried to summarized the 2PC here: https://developer.jboss.org/wiki/TwoPhaseCommit2PC
I'm not truly sure if the explanation is comprehensible enough but you can try to check. And you may let me know what's wrongly explained there. Thanks.
In the micro-services world the transaction boundaries are within a service. The services rely on eventual consistency. So in your example the Order service will send a request(Synchronous or Asynchronous depending on application semantics and scale requirements) like Deduct x amount from customer y for order z.
The customer service would do the action on customer record in a transaction and return response to the customer like Order z successfully processed or Order z processing failed.
The order service can then trigger the confirmation/failure process of the order depending on response received.
The application typically choose between availability and ACID strong consistency. Most microservice based scenarios demand availability and high scalability as opposed to strong consistency, which means the communication between services is asynchronous and consistent state is eventually achieved.
Could anyone provide an explanation or point me to a good source where it is explained the impact of long lived database transactions when there are other transactions involved?
I'm having difficulties trying to understand what is the real impact in the performance of an application of having transactions where most of the queries are reads and maybe a couple or three are writes, given the different isolation levels.
Mostly I would like to understand it in the situation where:
Neither the rows read nor the rows updated are involved in any other transaction.
The rows read are involved in another transaction but not the rows being updated and this other transaction is read only.
The rows read are involved in another transaction but not the rows being updated and this other transaction is modifying some data being read. I understand here it also affects whether the data is read before or after is being modified.
Both the rows read and the rows updated are involved in another transaction also modifying the data.
These questions come in the context of an application using micro services where all application layer services are annotated with #Transactional using JPA and PostgreSQL and, to transform the data, they need to do some network calls to other micro services within the transaction to fetch some other values.
I am confused about the working of LockModeTypes in JPA:
LockModeType.Optimistic
it increments the version while committing.
Question here is : If I have version column in my entity and if I don't specify this lock mode then also it works similarly then what is the use of it?
LockModeType.OPTIMISTIC_FORCE_INCREMENT
Here it increments the version column even though the entity is not updated.
but what is the use of it if any other process updated the same row before this transaction is committed? this transaction is anyways going to fail. so what is the use of this LockModeType.
LockModeType.PESSIMISTIC_READ
This lock mode issues a select for update nowait(if no hint timeout specified)..
so basically this means that no other transaction can update this row until this transaction is committed, then its basically a write lock, why its named a Read lock?
LockModeType.PESSIMISTIC_WRITE
This lock mode also issues a select for update nowait (if no hint timeout specified).
Question here is what is the difference between this lock mode and LockModeType.PESSIMISTIC_READ as I see both fires same queries?
LockModeType.PESSIMISTIC_FORCE_INCREMENT
this does select for update nowait (if no hint timeout specified) and also increments the version number.
I totally didn't get the use of it.
why a version increment is required if for update no wait is there?
I would first differentiate between optimistic and pessimistic locks, because they are different in their underlying mechanism.
Optimistic locking is fully controlled by JPA and only requires additional version column in DB tables. It is completely independent of underlying DB engine used to store relational data.
On the other hand, pessimistic locking uses locking mechanism provided by underlying database to lock existing records in tables. JPA needs to know how to trigger these locks and some databases do not support them or only partially.
Now to the list of lock types:
LockModeType.Optimistic
If entities specify a version field, this is the default. For entities without a version column, using this type of lock isn't guaranteed to work on any JPA implementation. This mode is usually ignored as stated by ObjectDB. In my opinion it only exists so that you may compute lock mode dynamically and pass it further even if the lock would be OPTIMISTIC in the end. Not very probable usecase though, but it is always good API design to provide an option to reference even the default value.
Example:
`LockModeType lockMode = resolveLockMode();
A a = em.find(A.class, 1, lockMode);`
LockModeType.OPTIMISTIC_FORCE_INCREMENT
This is a rarely used option. But it could be reasonable, if you want to lock referencing this entity by another entity. In other words you want to lock working with an entity even if it is not modified, but other entities may be modified in relation to this entity.
Example: We have entity Book and Shelf. It is possible to add Book to Shelf, but book does not have any reference to its shelf. It is reasonable to lock the action of moving a book to a shelf, so that a book does not end up in another shelf (due to another transaction) before end of this transaction. To lock this action, it is not sufficient to lock current book shelf entity, as the book does not have to be on a shelf yet. It also does not make sense to lock all target bookshelves, as they would be probably different in different transactions. The only thing that makes sense is to lock the book entity itself, even if in our case it does not get changed (it does not hold reference to its bookshelf).
LockModeType.PESSIMISTIC_READ
this mode is similar to LockModeType.PESSIMISTIC_WRITE, but different in one thing: until write lock is in place on the same entity by some transaction, it should not block reading the entity. It also allows other transactions to lock using LockModeType.PESSIMISTIC_READ. The differences between WRITE and READ locks are well explained here (ObjectDB) and here (OpenJPA). If an entity is already locked by another transaction, any attempt to lock it will throw an exception. This behavior can be modified to waiting for some time for the lock to be released before throwing an exception and roll back the transaction. In order to do that, specify the javax.persistence.lock.timeout hint with the number of milliseconds to wait before throwing the exception. There are multiple ways to do this on multiple levels, as described in the Java EE tutorial.
LockModeType.PESSIMISTIC_WRITE
this is a stronger version of LockModeType.PESSIMISTIC_READ. When WRITE lock is in place, JPA with the help of the database will prevent any other transaction to read the entity, not only to write as with READ lock.
The way how this is implemented in a JPA provider in cooperation with underlying DB is not prescribed. In your case with Oracle, I would say that Oracle does not provide something close to a READ lock. SELECT...FOR UPDATE is really rather a WRITE lock. It may be a bug in hibernate or just a decision that, instead of implementing custom "softer" READ lock, the "harder" WRITE lock is used instead. This mostly does not break consistency, but does not hold all rules with READ locks. You could run some simple tests with READ locks and long running transactions to find out if more transactions are able to acquire READ locks on the same entity. This should be possible, whereas not with WRITE locks.
LockModeType.PESSIMISTIC_FORCE_INCREMENT
this is another rarely used lock mode. However, it is an option where you need to combine PESSIMISTIC and OPTIMISTIC mechanisms. Using plain PESSIMISTIC_WRITE would fail in following scenario:
transaction A uses optimistic locking and reads entity E
transaction B acquires WRITE lock on entity E
transaction B commits and releases lock of E
transaction A updates E and commits
in step 4, if version column is not incremented by transaction B, nothing prevents A from overwriting changes of B. Lock mode LockModeType.PESSIMISTIC_FORCE_INCREMENT will force transaction B to update version number and causing transaction A to fail with OptimisticLockException, even though B was using pessimistic locking.
LockModeType.NONE
this is the default if entities don't provide a version field. It means that no locking is enabled conflicts will be resolved on best effort basis and will not be detected. This is the only lock mode allowed outside of a transaction
Dirty Read: The definition states that
dirty reading occurs when a transaction reads data from a row that has been modified by another transaction but not yet committed.
Assuming the definition is correct, I am unable to fathom any such situation.
Due to the principle of Isolation, the transaction A can not see the uncommitted data of the row that has been modified by transaction B. If transaction B has simply not committed, how transaction A can see it in the first place? It is only possible when both operations are performed under same transaction.
Can someone please explain what am I missing here?
"Dirty", or uncommitted reads (UR) are a way to allow non-blocking reads. Reading uncommitted data is not possible in an Oracle database due to the multi-version concurrency control employed by Oracle; instead of trying to read other transactions' data each transaction gets its own snapshot of data as they existed (committed) at the start of the transaction. As a result all reads are essentially non-blocking.
In databases that use lock-based concurrency control, e.g DB2, uncommitted reads are possible. A transaction using the UR isolation level ignores locks placed by other transactions, and thus it is able to access rows that have been modified but not yet committed.
Hibernate, being an abstraction layer on top of a database, offers the UR isolation level support for databases that have the capability.
Can be autonomous transactions dangerous? If yes, in which situations? When autonomous transactions are necessary?
Yes, autonomous transactions can be dangerous.
Consider the situation where you have your main transaction. It has inserted/updated/deleted rows. If you then, within that, set up an autonomous transaction then either
(1) It will not query any data at all. This is the 'safe' situation. It can be useful to log information independently of the primary transaction so that it can be committed without impacting the primary transaction (which can be useful for logging error information when you expect the primary transaction to be rolled back).
(2) It will only query data that has not been updated by the primary transaction. This is safe, but superfluous. There is no point to the autonomous transaction.
(3). It will query data that has been updated by the primary transaction. This smacks of a poorly thought through design, since you've overwritten something and then need to go back to see what it was before you overwrote it. Sometimes people think that an autonomous transaction will still see the uncommitted changes of the primary transaction, and it won't. It reads the currently committed state of the database, plus any changes made within the autonomous transaction. Some people (often trying autonomous transactions in response to mutating trigger errors) don't care what state the data is in when they try to read it and these people simply shouldn't be allowed access to a database.
(4). It will try to update/delete data that hasn't been updated by the primary transaction. Again, this smacks of poor design. These changes are going to get committed (or rolled back) whether or not the primary transaction succeeds or fails. Worse you risk issue (5) since it is hard to determine, within an autonomous transaction, whether the data has been updated by the primary transaction.
(5). You try to update/delete data that has already been updated by the primary transaction, in which case it will deadlock and end up in an ugly mess.
Can be autonomous transactions dangerous?
Yes.
If yes, in which situations?
When they're misused. For example, when used to make changes to data which should have been rolled back if the rest of the parent transaction is rolled back. Misusing them can cause data corruption because some portions of a change are committed, while others are not.
When are autonomous transactions necessary?
They are necessary when the effects of one transaction must survive, regardless of whether the parent transaction is committed or rolled back. A good example is a procedure which logs the progress and activity of a process to a database table.
When are autonomous transactions necessary?
Check my question: How can LOCK survive COMMIT or how can changes to LOCKed table be propagated to another session without COMMIT and losing LOCK
We ingest business configurations sequentially and should forbid parallel processing.
I use lock to table with configurations and update other tables accordingly. I commit each batched updates to other tables as we can't afford to keep transaction on all records - probability of collision would be near 0.99.
Each failure because of concurrent access is persisted to log for later update attempt.