Set JPA transactional isolation level to Serializable would be enough to deal with extreme concurrent condition? - spring

Extreme condition would be like there is only one or two products left but 1,000 people want to buy that at the very same time.
In this case, #Transactional(isolation = Isolation.SERIALIZABLE) would be enought to deal with it?
Or do I need to set #Lock(LockModeType.PESSIMISTIC_WRITE) on JPA method either?
I know isolation level is about read consistency whether lock is about write consistency but this is so confusing.
Any advise would be thankful!

Related

Nested transactions in Spring and Hibernate

I have a Spring Boot application with persistence using Hibernate/JPA.
I am using transactions to manage my database persistence, and I am using the #Transactional annotation to define the methods that should execute transactionally.
I have three main levels of transaction granularity when persisting:
Batches of entities to be persisted
Single entities to be persisted
Single database operations that persist an entity
Therefore, you can imagine that I have three levels of nested transactions when thinking about the whole persistence flux.
The interaction between between levels 2 and 3 works transparently as I desire because without specifying any Propagation behaviour for the transaction, the default is the REQUIRED behaviour, and so the entire entity (level 2) is rolled back because level 3 will support the transaction defined in level 2.
However, the problem is that I need an interaction between 1 and 2 that is slightly different. I need an entity to be rolled back individually if an error were to occur, but I wouldn't like the entire batch to be rolled back. That being said, I need to specify a propagation behavior in the level 2 annotation #Transactional(propagation = X) that follows these requirements.
I've tried REQUIRES_NEW but that doesn't work because it commits some of the entities from level 2 even if the whole batch had to be rolled back, which can also happen.
The behaviour that seems to fit the description better is NESTED, but that is not accepted when using Spring and Hibernate JPA, see here for more information.
This last link offers alternatives for the NESTED type, but I would like to know if NESTED would've really solved my problem, or if there was another behaviour that suited the job better.
I guess NESTED would roughly do what you want but I would question if this really is necessary. I don't know what you are trying to do or what the error condition is, but maybe you can get rid of the error condition by using some kind of WHERE clause or an UPSERT statement: Hibernate Transactions and Concurrency Using attachDirty (saveOrUpdate)

How to update ReadModel of an Aggregate that has an association with another Aggregate

I'm trying to separate read and write models. In summary I have this 2 entities with an association between them:
//AgregateRoot
class ProfessionalFamily {
private ProfessionalFamilyId id;
private String name;
}
//AgregateRoot
class Group {
private GroupId id;
private String literal;
private ProfessionalFamilyId professionalFamilyId; //ManyToOne association referenced by the ID of "professional-family"
}
The read model I'm using for return data in a Grid is the next one.
class GroupReadModel {
private String id;
private String groupLiteral;
private String professionalFamilyName;
}
I want to use NoSql for ReadModel queries and separate them for the write models. But my headache is: with that approach, when a Group is created I fire an Event (GroupCreated) and an Event handler listen the Event and store de Read/View/Projection Model in the NoSql database. So my question is: If I need to update the ProfessionalFamilyName and this is related with more than, for example 1000 groups (there are many more groups), how can I update all the Groups in ReadModel who is related with the professionalFamily I've been updated? Most probably I'm not doing anything well.
Thanks a lot.
NoSql databases are usually not designed to support data normalization and even intentionally break with this concept. If you would use a relational database system you would usually normalize your data and to each group you would only store the id of the ProfessionalFamily rather than duplicating the name of the ProfessionalFamily in each group document. So in general, for NoSql database duplication is accepted.
But I think before deciding to go with NoSql or a relational database you should consider (at least) the following:
Priority for speed of reads vs. writes:
If you need your writes (in your case changes of the name) to be very fast as they happen very often and the read speed is of lower priority maybe NoSql is not the best choice. You could still look into technology such as MongoDB which provides some kind of hybrid approach and allows to normalize and index data to a certain extent.
Writes will usally be faster when having a normalized structure in a relational database whereas reads will normally be faster without normalization and duplication in a NoSql database. But this is of course dependent on the technologies at hand which you are comparing as well as the amount of entities (in your case Groups) we are talking about as well as the amount of cross-referenced data. If you need to do lots of joins during the reads due to normalization you read performance will usually be worse compared to Group documents where all required data is already there due to duplication.
Control over the data structure/schema
If you are the one who knows how the data will look like you might not need the advantage of a NoSql database which is very well suited for data structures that change frequently or you are not in control of. If this is not really the case you might not benefit enough from NoSql technology.
And in addition there is another thing to consider: how consistent does your read model data have to be? As you are having some kind of event-sourcing approach I guess you are already embracing eventual consistency. That means not only the event processing is performed asynchronously but you could also accept that - getting back to your example - not all groups are being updated with the new family name at the same time but as well asynchronously or via some background jobs if it is not a problem that one Group still shows the old name while some other group already shows the new name for some time.
Most probably I'm not doing anything well.
You are not doing anything wrong or right per-se choosing this approach as long as you decide for NoSql (or against) for the right reasons which include these considerations.
My team and I discussed a similar scenario recently and we solved it by changing our CRUD approach to a DDD approach. Here is an example:
Given a traveler with a list of visited destinations.
If I have an event such a destinationUpdated then I should loop accross every travelers like you said, but does it make sens? What destinationUpdated means from a user point of view? Nothing! You should find the real user intent.
If the traveler made a mistake entering is visited destination then your event should be travelerCorrectedDestination which solve the problem because travelerCorrectedDestination now contains the travaler ID so you don't have to loop through all travelers anymore.
By applying a DDD approach problems usually solve by themselfs.

How to make transactions concurrent

So, I have a concurrent application that I am building using Scala, Akka and Spring.
I create writer actors and to each pass a chunk of data. This chunk of data belongs to 3 different classes. Hence, 3 different tables. There are parent-child relations between these 3 classes. So, the processing and insertion has to happen serially. Further there is a requirement that the whole chunk is inserted or none at all. Hence, the need for a transaction
Essentially from my writer, I call an insert method described as below.
#Transactional
insert(){
repo.save(obj1)
repo.save(obj2)
repo.batchSave(List(obj3))
}
This happens from all my writers. Without the #Transactional, the system is highly concurrent and fast. However with it, it is becoming serialized. That is,all my chunks are written one after the other, thus destroying all my concurrency.So, what am I missing if any, or is this a mandatory trade-off meaning is it not possible to have both transactions and concurrency.
Also, a very basic doubt about transactions.
Lets's say there are 2 transactions, T1 and T2
T1
begin
insert1
insert2
insert3
commit
T2
begin
insert4
insert5
insert6
commit
If I have 2 transactions as above with insert as the only operation. Will it be parallelized or will it be serialized? Is there anything like once T1 begins, will it release locks only after commit? How can this be parallelized? Because all isolation levels talk about is a read and write happening parallely and hence the case for dirty reads and READ_UNCOMMITTED.
Additional details:
Sybase relational DB
SpringJDBC jdbcTemplate for inserts
Isolation levels:Tried default and READ_UNCOMMITTED
Any guidance or ideas would be immensely helpful. Thanks

Which spring transaction isolation level to use to maintain a counter for product sold?

I have an e-commerce site written with Spring Boot + Angular. I need to maintain a counter in my product table for tracking how many has been sold. But the counter sometime becomes inaccurate when many users are ordering the same item concurrently.
In my service code, I have the following transactional declaration:
#Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.READ_COMMITTED)
in which, after persisting the order (using CrudRepository.save()), I do a select query to sum the quantities being ordered so far, hoping the select query will count all orders have been committed. But that doesn't seem to be the case, from time to time, the counter is less than the actual number.
Same issue happens for my other use case: quantity limit a product. I use the same transaction isolation setting. In the code, I'll do a select query to see how many has been sold and throw out of stock error if we can't fulfill the order. But for hot items, we some times oversold the item because each thread doesn't see the orders just committed in other threads.
So is READ_COMMITTED the right isolation level for my use case? Or I should do pessimistic locking for this use case?
UPDATE 05/13/17
I chose Ruben's approach as I know more about java than database so I took the easier road for me. Here's what I did.
#Transactional(propagation = Propagation.REQUIRES_NEW, isolation = Isolation.SERIALIZABLE)
public void updateOrderCounters(Purchase purchase, ACTION action)
I'm use JpaRepository so I don't play entityManager directly. Instead, I just put the code to update counters in a separate method and annotated as above. It seems to work well so far. I have seen >60 concurrent connections making orders and no oversold and the response time seems ok as well.
Depending on how you retrieve the total sold items count the available options might differ :
1. If you calculate the sold items count dynamically via a sum query on orders
I believe in this case the option you have is using SERIALIZABLE isolation level for the transaction, since this is the only one which supports range locks and prevents phantom reads.
However, I would not really recommend going with this isolation level since it has a major performance impact on your system (or used really carefully on a well designed spots only).
Links : https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html#isolevel_serializable
2. If you maintain a counter on product or some other row associated with the product
In this case I would probably recommend using row level locking eg select for update in a service method which checks the availability of the product and increments the sold items count. The high level algorithm of the product placement could be similar to the steps below :
Retrieve the row storing the number of remaining/sold items count using the select for update query (#Lock(LockModeType.PESSIMISTIC_WRITE) on a repository method).
Make sure that the retrieved row has up to date field values since it could be retrieved from the Hibernate session level cache (hibernate would just execute select for update query on the id just to acquire the lock). You can achieve this by calling 'entityManager.refresh(entity)'.
Check the count field of the row and if the value is fine with your business rules then increment/decrement it.
Save the entity, flush the hibernate session, and commit the transaction (explicitly or implicitly).
A meta code is below :
#Transactional
public Product performPlacement(#Nonnull final Long id) {
Assert.notNull(id, "Product id should not be null");
entityManager.flush();
final Product product = entityManager.find(Product.class, id, LockModeType.PESSIMISTIC_WRITE);
// Make sure to get latest version from database after acquiring lock,
// since if a load was performed in the same hibernate session then hibernate will only acquire the lock but use fields from the cache
entityManager.refresh(product);
// Execute check and booking operations
// This method call could just check if availableCount > 0
if(product.isAvailableForPurchase()) {
// This methods could potentially just decrement the available count, eg, --availableCount
product.registerPurchase();
}
// Persist the updated product
entityManager.persist(product);
entityManager.flush();
return product;
}
This approach will make sure that no any two threads/transactions will be ever performing a check and update on the same row storing the count of a product concurrently.
However, because of that it will also have some performance degradation effect on your system hence it is essential to make sure that atomic increment/decrement is being used as far in the purchase flow as possible and as rare as possible (eg, right in the checkout handling routine when customer hits pay). Another useful trick for minimizing the effect of a lock would be adding that 'count' column not on a product itself but on a different table which is associated with the product. This will prevent you from locking the products rows, since the locks will be acquired on a different row/table combination which are used purely during the checkout stage.
Links: https://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html
Summary
Please note that both of the techniques introduce extra synchronization points in your system hence reducing throughput. So please make sure to carefully measure the impact it has on your system via performance test or any other technique which is being used in your project for measuring the throughput.
Quite often online shops choose going towards overselling/booking some items rather then affecting the performance.
Hope this helps.
With these transaction settings, you should see the stuff that is committed. But still, your transaction handling isn't water tight. The following might happen:
Let's say you have one item in stock left.
Now two transactions start, each ordering one item.
Both check the inventory and see: "Fine enough stock for me."
Both commit.
Now you oversold.
Isolation level serializable should fix that. BUT
the isolation levels available in different databases vary widely, so I don't think it is actually guaranteed to give you the requested isolation level
this limits seriously limits scalability. The transactions doing this should be as short and as rare as possible.
Depending on the database you are using it might be a better idea to implement this with a database constraint. In oracle, for example, you could create a materialized view calculating the complete stock and put a constraint on the result to be non-negative.
Update
For the materialized view approach you do the following.
create materialized view, that calculates the value that you want to constraint, e.g. the sum of orders. Make sure the materialized view gets updated in the transaction that change the content of the underlyingt tables.
For oracle this is achieved by the ON COMMIT clause.
ON COMMIT Clause
Specify ON COMMIT to indicate that a fast refresh is to occur whenever the database commits a transaction that operates on a master table of the materialized view. This clause may increase the time taken to complete the commit, because the database performs the refresh operation as part of the commit process.
See https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6002.htm for more details.
Put a check constraint on that materialized view to encode the constraint that you want, e.g. that the value is never negative. Note, that a materialized view is just another table, so you can create constraints just as you would normaly do.
See fore example https://www.techonthenet.com/oracle/check.php

Is Hibernate good for batch processing? What about memory usage?

I have a daily batch process that involves selecting out a large number of records and formatting up a file to send to an external system. I also need to mark these records as sent so they are not transmitted again tomorrow.
In my naive JDBC way, I would prepare and execute a statement and then begin to loop through the recordset. As I only go forwards through the recordset there is no need for my application server to hold the whole result set in memory at one time. Groups of records can be feed across from the database server.
Now, lets say I'm using hibernate. Won't I endup with a bunch of objects representing the whole result set in memory at once?
Hibernate does also iterate over the result set so only one row is kept in memory. This is the default. If it to load greedily, you must tell it so.
Reasons to use Hibernate:
"Someone" was "creative" with the column names (PRXFC0315.XXFZZCC12)
The DB design is still in flux and/or you want one place where column names are mapped to Java.
You're using Hibernate anyway
You have complex queries and you're not fluent in SQL
Reasons not to use Hibernate:
The rest of your app is pure JDBC
You don't need any of the power of Hibernate
You have complex queries and you're fluent in SQL
You need a specific feature of your DB to make the SQL perform
Hibernate offers some possibilities to keep the session small.
You can use Query.scroll(), Criteria.scroll() for JDBC-like scrolling. You can use Session.evict(Object entity) to remove entities from the session. You can use a StatelessSession to suppress dirty-checking. And there are some more performance optimizations, see the Hibernate documentation.
Hibernate as any ORM framework is intended for developing and maintaining systems based on object oriented programming principal. But most of the databases are relational and not object oriented, so in any case ORM is always a trade off between convenient OOP programming and optimized/most effective DB access.
I wouldn't use ORM for specific isolated tasks, but rather as an overall architectural choice for application persistence layer.
In my opinion I would NOT use Hibernate, since it makes your application a whole lot bigger and less maintainable and you do not really have a chance of optimizing the generated sql-scripts in a quick way.
Furthermore you could use all the SQL functionality the JDBC-bridge supports and are not limited to the hibernate functionality. Another thing is that you have the limitations too that come along with each layer of legacy code.
But in the end it is a philosophical question and you should do it the way it fits you're way of thinking best.
If there are possible performance issues then stick with the JDBC code.
There are a number of well known pure SQL optimisations which
which would be very difficult to do in Hibernate.
Only select the columns you use! (No "select *" stuff ).
Keep the SQl as simple as possible. e.g. Dont include small reference tables like currency codes in the join. Instead load the currency table into memory and resolve currency descriptions with a program lookup.
Depending on the DBMS minor re-ordering of the SQL where predicates can have a major effect on performance.
If you are updateing/inserting only commit every 100 to 1000 updates. i.e. Do not commit every unit of work but keep some counter so you commit less often.
Take advantage of the aggregate functions of your database. If you want totals by DEPT code then do it in the SQL with " SUM(amount) ... GROUP BY DEPT ".

Resources