What's the performance penalty of long lived DB transactions interleaved with one another? - performance

Could anyone provide an explanation or point me to a good source where it is explained the impact of long lived database transactions when there are other transactions involved?
I'm having difficulties trying to understand what is the real impact in the performance of an application of having transactions where most of the queries are reads and maybe a couple or three are writes, given the different isolation levels.
Mostly I would like to understand it in the situation where:
Neither the rows read nor the rows updated are involved in any other transaction.
The rows read are involved in another transaction but not the rows being updated and this other transaction is read only.
The rows read are involved in another transaction but not the rows being updated and this other transaction is modifying some data being read. I understand here it also affects whether the data is read before or after is being modified.
Both the rows read and the rows updated are involved in another transaction also modifying the data.
These questions come in the context of an application using micro services where all application layer services are annotated with #Transactional using JPA and PostgreSQL and, to transform the data, they need to do some network calls to other micro services within the transaction to fetch some other values.

Related

Transaction-Management and Performance in Spring-based Java-Appllications

Our scenario: we receive bulks of messages from Kafka and write them to the DB after certain processing. Currently we achieve DB-write rates (in our company network) of up to 300..310 thousand records/min. But my colleagues want more (500K-600K/min.)
The affected Java application has a functional layer (a "business facade" so to speak), underneath we have classes of persistence layer, which write records grouped to individual tables into the DB as bulk inserts/updates. Whereas a bulk insert/update has been implemented as a #Transactional(REQUIRED) - i.e. default setting. Therefore, a received group of Kafka messages often means more than 1 database transaction.
I know that a DB-commit is expensive in terms of performance. I used the following settings when configuring our Spring-based data sources:
useConfigs=maxPerformance
rewriteBatchedStatements=true
prepStmtCacheSize=256
prepStmtCacheSqlLimit=2048
This did improve performance, but not to the desired benchmark of 500K-600K DB-writes/min.
Question to you, colleagues: is it OK from the standpoint of software architecture and for performance increase to annotate our "business facade" class as #Transactional(REQUIRED) and the DB layer classes as #Transactional(SUPPORTS). Thus, I want to have only one transaction per group/bulk of Kafka messages and thereby increase the DB-write rates by avoiding "excessive" commits.
Personally, I'm a bit hesitant about this change. On the one hand, I'm breaking here the boundaries of the areas of responsibility of the individual classes/layers: business logic "high-level" classes should know nothing about transaction management and the persistence layer classes should treat DB transactions as their core task. On the other hand, unwanted "cross-dependencies" arise: i.e. if an update for a table XYZ fails, then a rollback is also made for another table ABC, although everything ran smoothly there (remember all tables are getting now updates and inserts within one transaction!).
What do you think about this potential change in the transaction management? How can you fine-tune a spring-boot application to achieve higher write rates (configuration or maybe implementation changes)?

Evaluation of ehcache in web application

Is it a good practice to store your data in ehcache to improve the performance of a web application when lots of update operation on data regularly?
It all depends on how much reads you have over writes. Your updates will be costlier. So the time gain by reading should offset that.
Ehcache handles concurrent access. However, it is atomic, not transactional. So if you are getting multiple values from different caches, you can get updates in-between. But that's the same for a database. Also, you can use XA to make sure your writes are in sync with the database.

Can we persist two different table entity in DynamoDB under one single transaction

I have two tables in Amazon DynamoDB where I have to persist data in a single transaction Using spring boot. if the persistence fails in the second table it should rollback for the first table also.
I have tried looking into AWSLAB-amazon DynamoDB transaction but it only helps for a single table.
Try using the built-in DynamoDB transactions capability. From the limited information you give, it should do what you are looking for across regional tables. Just keep in mind that there is no rollback per se. Either all items in a transaction work or none of them. The internal transaction coordinator handles that for you though.
Now that this feature is out, you should not be looking at the AWSlabs tool most likely.

Commits in the absence of locks in CockroachDB

I'm trying to understand how ACID in CockroachDB works without locks, from an application programmer's point of view. Would like to use it for an accounting / ERP application.
When two users update the same database field (e.g. a general ledger account total field) at the same time what does CockroachDB do? Assuming each is updating many other non-overlapping fields at the same time as part of the respective transactions.
Will the aborted application's commit process be informed about this immediately at the time of the commit?
Do we need to take care of additional possibilities than, for example, in ACID/locking PostgreSQL when we write the database access code in our application?
Or is writing code for accessing CockroachDB for all practical purposes the same as for accessing a standard RDBMS with respect to commits and in general.
Of course, ignoring performance issues / joins, etc.
I'm trying to understand how ACID in CockroachDB works without locks, from an application programmer's point of view. Would like to use it for an accounting / ERP application.
CockroachDB does have locks, but uses different terminology. Some of the existing documentation that talks about optimistic concurrency control is currently being updated.
When two users update the same database field (e.g. a general ledger account total field) at the same time what does CockroachDB do? Assuming each is updating many other non-overlapping fields at the same time as part of the respective transactions.
One of the transactions will block waiting for the other to commit. If a deadlock between the transactions is detected, one of the two transactions involved in the deadlock will be aborted.
Will the aborted application's commit process be informed about this immediately at the time of the commit?
Yes.
Do we need to take care of additional possibilities than, for example, in ACID/locking PostgreSQL when we write the database access code in our application?
Or is writing code for accessing CockroachDB for all practical purposes the same as for accessing a standard RDBMS with respect to commits and in general.
At a high-level there is nothing additional for you to do. CockroachDB defaults to serializable isolation which can result in more transaction restarts that weaker isolation levels, but comes with the advantage that the application programmer doesn't have to worry about anomalies.

How to make transactions concurrent

So, I have a concurrent application that I am building using Scala, Akka and Spring.
I create writer actors and to each pass a chunk of data. This chunk of data belongs to 3 different classes. Hence, 3 different tables. There are parent-child relations between these 3 classes. So, the processing and insertion has to happen serially. Further there is a requirement that the whole chunk is inserted or none at all. Hence, the need for a transaction
Essentially from my writer, I call an insert method described as below.
#Transactional
insert(){
repo.save(obj1)
repo.save(obj2)
repo.batchSave(List(obj3))
}
This happens from all my writers. Without the #Transactional, the system is highly concurrent and fast. However with it, it is becoming serialized. That is,all my chunks are written one after the other, thus destroying all my concurrency.So, what am I missing if any, or is this a mandatory trade-off meaning is it not possible to have both transactions and concurrency.
Also, a very basic doubt about transactions.
Lets's say there are 2 transactions, T1 and T2
T1
begin
insert1
insert2
insert3
commit
T2
begin
insert4
insert5
insert6
commit
If I have 2 transactions as above with insert as the only operation. Will it be parallelized or will it be serialized? Is there anything like once T1 begins, will it release locks only after commit? How can this be parallelized? Because all isolation levels talk about is a read and write happening parallely and hence the case for dirty reads and READ_UNCOMMITTED.
Additional details:
Sybase relational DB
SpringJDBC jdbcTemplate for inserts
Isolation levels:Tried default and READ_UNCOMMITTED
Any guidance or ideas would be immensely helpful. Thanks

Resources