How best to handle Constraint Violations in Spring JDBC - spring

What is considered the best/most correct way to handle a Constraint Violation in Spring JDBC?
As a real example. I've got a users table that contains, amongst other things, columns of url_slug and email. Both of these are UNIQUE. When creating a new record, or updating an existing record, if I try to make one record contain a duplicate value of another record in this table I want to return a sensible error back to the caller.
The only options I can think of are both flawed. I'm doing this in Postgres 10, using Spring NamedParameterJdbcTemplate in case that matters at all.
Check the data before doing the INSERT/UPDATE.
This will involve an extra query on every Insert/Update call, and has a race condition that means it might still not catch it. i.e.
Thread 1 starts transaction
Thread 2 starts transaction
Thread 1 queries data
Thread 2 queries data
Thread 1 does update
Thread 2 does update
Thread 1 does commit
Thread 2 does commit <-- Constraint Violation, even though at #4 the data was fine
Handle the DuplicateKeyException.
The problem here is that it's not thrown until the Transaction is committed, at which point it might well be unclear exactly which SQL call failed, which constraint failed, or anything else like that.

There is no "best" way to handle these kind of exceptions, except putting it into a try catch block and propagate back the error message to the user.
Of course in your example the problem is that you most probably don't want to use Serializable isolation level which essentially executes every transaction one-by-one, making sure that this cannot happen. Another way would be to lock the table for the entire transaction, but I wouldn't advise that either.
Simply put your transactional call into a try-catch block and handle it as you want.

Related

Spring jpa performance the smart way

I have a service that listens to multiple queues and saves the data to a database.
One queue gives me a person.
Now if I code it really simple. I just get one message from the queue at a time.
I do the following
Start transaction
Select from person table to check if it exists.
Either update existing or create a new entity
repository.save(entity)
End transaction
The above is clean and robust. But I get alot of messages its not fast enough.
To improve performance I have done this.
Fetch 100 messages from queue
then
Start transaction
Select all persons where id in (...) in one query using ids from incomming persons
Iterate messages and for each one check if it was selected above. If yes then update it if not then create a new
Save all changes with batch update/create
End transaction
If its a simple message the above is really good. It performs. But if the message is complicated or the logic I should do when I get the message is then the above is not so good since there is a change some of the messages will result in a rollback and the code becomes hard to read.
Any ideas on how to make it run fast in a smarter way?
Why do you need to rollback? Can't you just not execute whatever it is that then has to be rolled back?
IMO the smartest solution would be to code this with a single "upsert" statement. Not sure which database you use, but PostgreSQL for example has the ON CONFLICT clause for inserts that can be used to do updates if the row already exists. You could even configure Hibernate to use that on insert by using the #SQLInsert annotation.

How to create a threadsafe insert or update with hibernate.(Dealing with optimistic locking)

My problem.
I have a simple table, token. It has only a few attributes. id, token,username,version and a expire_date.
I have a rest service that will create or update a token. So when a user request a token, I would like to check if the user (using the username) already has an entry, if yes, then simply update the expire_date and return, if there is no entry create a new one. The problem is that if I create a test with a few concurrent users(using a jmeter script), that call the rest service, hibernate will very fast
throw a StaleObject exception because what happens is: Thread A will select the row for the user, change the expire_date, bump the version, meanwhile thread B will do the same but will actually manage to commit before thread A commits. Now when thread A will commit hibernate detects the version change and will throw the exception and rollback. All works as documented.
But what I would like to happen, is that Thread B will wait for Thread A to finish before doing it's thing.
What is the best way to solve this? Should I use java concurrency package and implement locks? Or is it a better option to implement a custom jpa isolation level?
Thanks
If you are using JEE server, EJB container will do it for you using #Singleton.
I think the best way is using JPA lock to acquire lock on resources you are currently updating(row lock). Don't push your effort to implement row locking using java concurrency by your self. Ex: it will be much easier to lock row contain user "john.doe" in dbms level rather finding a way locking a specific row using concurrency in your code.

Oracle- relying on ROLLBACK for data validation

Is relying on the oracle ROLLBACK command good practice for importing data, validating the data and THEN performing a ROLLBACK?
I've had a data import program built for our ERP, and looking at the code, they insert the data into the real tables, validate, and if it fails validation, they perform a ROLLBACK. I've always validated data before inserting, but just curious if this is an accepted method to rely on?
There are a few things to remember here-
Constraints enable us preserve data integrity. This means that constraints allow us to enforce business rules (or at least the most basic of those) at the database level itself.
A commit or a rollback is a method of preserving or undoing the changes made in a transaction. If you issue a commit after a series of successfully run DML statements, the changes are preserved. The rollback statement would undo the changes.
If, in a series of DML statements, if one of those fails, the effects of that particular statement are rolled back. E.g., if an UPDATE statement updates 10 rows and one of those violates a vital constraint, any of the 10 rows are not updated. Yet, the effects of its preceding statements are not implicitly rolled back.
In order to preserve data integrity and keep the data as per the business requirements, you must issue a manual ROLLBACK statement if any of the DMLs fail.
What you are seeing in your program is the same practice. It doesn't issue a ROLLBACK after a successful transaction, but only after a failed DML, if you look at the code closely. This is indeed a good practice to roll back on failure and commit only if everything goes right.
Front end checks on data are indeed an essential part of any application. This ensures that the data being entered conforms to the business roles. Even in this case, constraints must be applied to perform checks at the database level. This is particularly helpful when some rookie makes changes to the front end and tries to enter invalid data. This is also helpful when someone is bypassing the application and entering data manually. Hence putting constraints on the database level is always necessary.

One data store. Multiple processes. Will this SQL prevent race conditions?

I'm trying to create a Ruby script that spawns several concurrent child processes, each of which needs to access the same data store (a queue of some type) and do something with the data. The problem is that each row of data should be processed only once, and a child process has no way of knowing whether another child process might be operating on the same data at the same instant.
I haven't picked a data store yet, but I'm leaning toward PostgreSQL simply because it's what I'm used to. I've seen the following SQL fragment suggested as a way to avoid race conditions, because the UPDATE clause supposedly locks the table row before the SELECT takes place:
UPDATE jobs
SET status = 'processed'
WHERE id = (
SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
) RETURNING id, data_to_process;
But will this really work? It doesn't seem intuitive the Postgres (or any other database) could lock the table row before performing the SELECT, since the SELECT has to be executed to determine which table row needs to be locked for updating. In other words, I'm concerned that this SQL fragment won't really prevent two separate processes from select and operating on the same table row.
Am I being paranoid? And are there better options than traditional RDBMSs to handle concurrency situations like this?
As you said, use a queue. The standard solution for this in PostgreSQL is PgQ. It has all these concurrency problems worked out for you.
Do you really want many concurrent child processes that must operate serially on a single data store? I suggest that you create one writer process who has sole access to the database (whatever you use) and accepts requests from the other processes to do the database operations you want. Then do the appropriate queue management in that thread rather than making your database do it, and you are assured that only one process accesses the database at any time.
The situation you are describing is called "Non-repeatable read". There are two ways to solve this.
The preferred way would be to set the transaction isolation level to at least REPEATABLE READ. This will mean that any row that concurrent updates of the nature you described will fail. if two processes update the same rows in overlapping transactions one of them will be canceled, its changes ignored, and will return an error. That transaction will have to be retried. This is achieved by calling
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
At the start of the transaction. I can't seem to find documentation that explains an idiomatic way of doing this for ruby; you may have to emit that sql explicitly.
The other option is to manage the locking of tables explicitly, which can cause a transaction to block (and possibly deadlock) until the table is free. Transactions won't fail in the same way as they do above, but contention will be much higher, and so I won't describe the details.
That's pretty close to the approach I took when I wrote pg_message_queue, which is a simple queue implementation for PostgreSQL. Unlike PgQ, it requires no components outside of PostgreSQL to use.
It will work just fine. MVCC will come to the rescue.

Achieving ACID properties using JDBC?

First of all i would like to confirm is it the responsibility of developer to follow these properties or responsibilty of transaction Apis like JDBC?
Below is my understanding how we achieve acid properties in JDBC
Atomicity:- as there is one transaction associated with connection, so we do commit or rollback , there are no partial updation.Hence achieved
Consitency:- when some data integrity constraint is voilated (say some check constraint) then sqlexception will be thrown . Then programmer acieve the consistent database by rollbacking the transaction?
one question on above say we do transaction1 and sql excpetion is thrown during transaction 2 as explained above . Now we catch the exception and do the commit will first transaction be commited?
Isolation:- Provided by JDBC Apis.But this leads to the problem of concurrent update . so it has be dealt manually right?
Durability:- Provided by JDBC Apis.
Please let me if above understanding is right?
ACID principles of transactional integrity are implemented by the database not by the API (like JDBC) or by the application. Your application's responsibility is to choose a database and a database configuration that supports whatever transactional integrity you need and to correctly identify the transactional boundaries in your application.
When an exception is thrown, your application has to determine whether it is appropriate to rollback the entire transaction or to proceed with additional processing. It may be appropriate if your application is processing orders from a vendor, for example, to process the 99 orders that succeed and log the 1 order that failed somewhere for users to investigate. On the other hand, you may reject all 100 orders because 1 failed. It depends what your application is doing.
In general, you only have one transaction open at a time (or, more accurately, one transaction per connection). So if you are working in transaction 2, transaction 1 by definition has already completed-- it was either committed or rolled back previously. Exceptions thrown in transaction 2 have no impact on transaction 1.
Depending on the transaction isolation level your application requests (and the transaction isolation levels your database supports) as well as the mechanics of your application, lost updates are something that you may need to be concerned about. If you set your transaction isolation level to read committed, it is possible that you would read a value as 'A' in transaction 1, wait for a user to do something, update the value to 'B', and commit without realizing that transaction 2 updated the value to 'C' between the time you read the data and the time you wrote the data. This may be a problem that you need to deal with or it may be something where it is fine for the last person to update a row to "win".
Your database, on the other hand, should take care of the automatic locking that prevents two transactions from simultaneously updating the same row of the same table. It may do this by locking more than is strictly necessary but it will serialize the updates somehow.

Resources