Cache tables with parallel services causing problems. unique constraint SQLException. Spring JDBC

Cache tables with parallel services causing problems. unique constraint SQLException. Spring JDBC - oracle

Using oracle database.
Here's how i think the SQLException happens...
Say i have two instances of a service running in parallel. Both of them do the following:
Query cache(B) to see if Person exists there.
If person exists, but out of date OR doesnt exist = do a query on the main database(A).
If Person found in database (A) and NOT found earlier in cache (B). INSERT, else if person was found in cache earlier but was out of date UPDATE cache.
I use the following code to make the decision, based on earlier query to cache B.
void insertOrUpdate(RegistryPersonMo person) {
if (person.getId() == null) {
insertPerson(person);
} else {
updatePerson(person);
}
}
and insert using Spring JDBC:
void insertPerson(RegistryPersonMo person) {
Number id = insertInto("PERSON_REGISTRY", "RAAMAT").usingGeneratedKeyColumns("ID").executeAndReturnKey(usingParameters(person));
if (id != null) {
person.setId(id.longValue());
}
}
The actual problem occurs when two instances of the service have finished querying the cache(B) and the person wasn't found (null). Then one instance does an INSERT, because data did not exist.The other gets SQLException upon trying to do the same, because an entry with a unique constraint already exists.
Does anyone know what the best\standard workaround is? Some ideas i've had:
Lock reading of the row until insert done. Can i do this using Spring?
Use replace or insert with ignore. still learning, are there any downsides to these ?
Bear in mind i'd like to use Spring and automate the query as much as possible..

I think it's fine in this situation just to ignore the unique constraint exception. Yes, this is race condition but the expected one - desired outcome is achieved, record inserted. Perhaps log it to be able to assert how often this is happening.
Locking or transaction serialization would resolve this issue but won't make much sense in this case, in my opinion.

Related

JPA #Version behavior when data is changed from unmanaged connection

Enabling #Version on table Customer when running the tests below
#Test
public void actionsTest1 () throws InterruptedException {
CustomerState t = customerStateRepository.findById(1L).get();
Thread.sleep(20000);
t.setInvoiceNumber("1");
customerStateRepository.save(t);
}
While actionsTest1 is sleeping, I run actionsTest2 which updates the invoice number to 2.
#Test
public void actionsTest2 () throws InterruptedException {
CustomerState t = customerStateRepository.findById(1L).get();
t.setInvoiceNumber("2");
customerStateRepository.save(t);
}
When actionsTest1 returns from sleeping it tries to update too, and gets a ObjectOptimisticLockingFailureException
Works as expected.
But if I run actionsTest1 and while it is sleeping I open a SQL terminal and do a raw update of
update customer
set invoice_number='3' where id=1
When actionsTest1 returns from sleeping, its versioning mechanism doesn't catch the case and updates the value back to 1.
Is that expected behavior? Does versioning work only with connections managed by JPA?

It works as expected. If you do a update manually, you have to update your version as well.
If you using JPA with #Version, JPA is incrementing the version column.
To get your expected result you have to write the statement like this
update customer set invoice_number='3', version=XYZ (mabye version+1) where id=1

Is that expected behavior?
Yes.
Does versioning work only with connections managed by JPA?
No, it also works when using any other way of updating your data. But everything updating the data has to adhere to the rules of optimistic locking:
increment the version column whenever performing any update
(only required when the other process also want to detect concurrent updates): on every update check that the version number hasn't changes since the data on which the update is based was loaded.

Hibernate automatically increases/changes the value in #Version mapped column in your database.
When you fetch an entity record, hibernate keeps a copy of the record of the data along with the value of #Version. While performing a merge or update operation, hibernate checks if the current value in of Version is still the same and matches the copy of entity fetched earlier.
If the value matches, it means that the entity is not dirty(not updated by any other transaction) else an exception is thrown.

Unexpected in Spring partition when using synchronized

I am using Spring Batch and Partition to do parallel processing. Hibernate and Spring Data Jpa for db. For the partition step, the reader, processor and writer have stepscope and so I can inject partition key and range(from-to) to them. Now in processor, I have one synchronized method and expected this method to be ran once at time, but it is not the case.
I set it to have 10 partitions , all 10 Item reader read the right partitioned range. The problem comes with item processor. Blow code has the same logic I use.
public class accountProcessor implementes ItemProcessor{
#override
public Custom process(item) {
createAccount(item);
return item;
}
//account has unique constraints username, gender, and email
/*
When 1 thread execute that method, it will create 1 account
and save it. If next thread comes in and try to save the same account,
it should find the account created by first thread and do one update.
But now it doesn't happen, instead findIfExist return null
and it try to do another insert of duplicate data
*/
private synchronized void createAccount(item) {
Account account = accountRepo.findIfExist(item.getUsername(), item.getGender(), item.getEmail());
if(account == null) {
//account doesn't exist
account = new Account();
account.setUsername(item.getUsername());
account.setGender(item.getGender());
account.setEmail(item.getEmail());
account.setMoney(10000);
} else {
account.setMoney(account.getMoney()-10);
}
accountRepo.save(account);
}
}
The expected output is that only 1 thread will run this method at any given time and so that there will be no duplicate inserttion in db as well as avoid DataintegrityViolationexception.
Actually result is that second thread can't find the first account and try to create a duplicate account and save to db, which will cause DataintegrityViolationexception, unique constraints error.
Since I synchronized the method, thread should execute it in order, second thread should wait for first thread to finish and then run, which mean it should be able to find the first account.
I tried with many approaches, like a volatile set to contains all unique accounts, do saveAndFlush to make commits asap, using threadlocal whatsoever, no of these works.
Need some help.

Since you made the item processor step-scoped, you don't really need synchronization as each step will have its own instance of the processor.
But it looks like you have a design problem rather than an implementation issue. You are trying to sychronize threads to act in a certain order in a parallel setup. When you decide to go parallel and divide the data into partitions and give each worker (either local or remote) a partition to work on, you must admit that these partitions will be processed in an undefined order and that there should be no relation between records of each partition or between the work done by each worker.
When 1 thread execute that method, it will create 1 account
and save it. If next thread comes in and try to save the same account,
it should find the account created by first thread and do one update. But now it doesn't happen, instead findIfExist return null and it try to do another insert of duplicate data
That's because the transaction of thread1 may not be committed yet, hence thread2 won't find the record you think have been inserted by thread1.
It looks like you are trying to create or update some accounts with a partitioned setup. I'm not sure if this setup is suitable for the problem at hand.
As a side note, I would not call accountRepo.save(account); in an item processor but rather do that in an item writer.
Hope this helps.

Double instances in database after using EntityManager.merge() in Transient Method

I am new with Spring, my application, developed with Spring Roo has a Cron that every day download some files and update a database.
The update is done, after downloading and parsing the files, using merge(),
an Entity class Dataset has a list called resources, after the download I do:
dataset.setResources(resources);
dataset.merge();
and dataset.merge() does the following:
#Transactional
public Dataset Dataset.merge() {
if (this.entityManager == null) this.entityManager = entityManager();
Dataset merged = this.entityManager.merge(this);
this.entityManager.flush();
return merged;
}
I expect that doing dataset.setResources(resources); I would overwrite the filed resources, and so even the database entry would be overwritten.
But I get double entries in the database: every resource appear twice, with different IDs (incremental).
How can I succed in let my application doing updates and not insert? A naive solution would be delete manually the old resource and then call merge(); is this the way or is there some more smart solution?

This situation occurs when you use Hibernate as persistence engine and your entities have version field.
Normally the ID field is what we need for merging a detached object with its persistent state in the database, but Hibernate takes the version field in account and if you don't set it (it is null) Hibernate discards the value of ID field and creates a new object with new ID.
To know if you are affected by this strange feature of Hibernate, set a value in the version field, if an Exception is thrown you got it. In that case the best way to solve it is the data to parse contain the right value of version. Another ways are to disable version checking (see Hibernate ref guide to know about it) or load persistent state before merging.

Getting SqlCeException on restart if I don't insert data to the DB

Basically, I have a LINQ database context and its model. As usually, I create the DB in the SQL context if the DB does not exist (the context is a singleton and on every access to it, this is checked).
Everything works well if I add data to the DB on the first launch. But if I don't insert any data during the first start of the app, on successive launches I get
SqlCeException:The specified table does not exist [TableName]
I don't know how more specifically I can explain it, but the exception comes immediately whenever I do a LINQ query on the second launch of the app if I don't insert any data on the first launch. If i do insert some data during the first launch, all is fine for the rest of the app's life time. Why would it be a bad thing to create the DBs and introduce the DB context, but not insert any data?
Here's my LINQ DB model:
https://github.com/kypeli/Podcatcher/blob/master/wp7/Podcatcher/ViewModels/PodcastSubscriptionModel.cs
Here's where I get the exception on second start if I didn't insert any data on the first launch:
https://github.com/kypeli/Podcatcher/blob/master/wp7/Podcatcher/PodcastSqlModel.cs#L64
It also strikes me that there's no API call to check if a table exists or not in LINQ, so I would have to assume "this should just work" - but it doesn't.
Any ideas? Thanks! :)
Update: I verified analyzing the .sdf file that indeed there are no tables created if I don't insert any data upon first launch of the app. As I see it:
This is a bug in LINQ-to-SQL. It should not crash if there are no tables present, but know that it should create them. Or deal with the case and create tables only when data is inserted.
I would need to insert some dummy data into SQL always on first launch, or...
Check if a table exists, if not, react to it by forcing LINQ-to-SQL to create them. But how?

I've dealt with this problem also, I've fixed it this way:
get the data context:
dbDataContext = new DBDataContext(DBConnectionString);
if( dbDataContext.DatabaseExists() == true)
//then try to get an entity:
System.Data.Linq.Table<Entity> entities = dbDataContext.Tablename;
//try to get an element from the entity:
IEnumerator<Entity> enumEntity = entities.GetEnumerator();
entities.GetEnumerator(); will always raise the exception "Table not found."
Just use a try/catch and in the catch scope delete the db and recreate it, because your DB is empty anyway :)
dbDataContext.DeleteDatabase();
dbDataContext.CreateDatabase();
dbDataContext.SubmitChanges();

Linq To SQL Without Explicit Foreign Key Relationships

I am working with a few legacy tables that have relationships, but those relationships haven't been explicitly set as primary/foreign keys. I created a .dbml file using "Linq To Sql Classes" and established the proper Case.CaseID = CaseInfo.CaseID association. My resulting class is CasesDataContext.
My Tables (One to many):
Case
------------------
CaseID (int not null)
MetaColumn1 (varchar)
MetaColumn2 (varchar)
MetaColumn3 (varchar)
...
CaseInfo
------------------
CaseInfoID (int)
CaseID (int nulls allowed)
CaseInfoMeta (varchar)
...
I'm new to LinqToSQL and am having trouble doing..
CasesDataContext db = new CasesDataContext();
var Cases = from c in db.Cases
where c.CaseInfo.CaseInfoMeta == "some value"
select c;
(Edit) My problem being that CaseInfo or CaseInfos
is not available as a member of Cases.
I heard from a colleague that I might try ADO.Net Entity Data Model to create my Data Context class, but haven't tried that yet and wanted to see if I'd be wasting my time or should I go another route. Any tips, links, help would be most appreciated.

Go back to the designer and check the relation is set up correctly. Here is one real life example, with BillStateMasters have "CustomerMasters1" property (customers for the state):
Ps. naming is being cleaned up ...
Update 1: You also need to make sure both tables have a primary defined. If the primary key isn't defined on the database (and can't be defined for whatever reason), make sure to define them in the designer. Open the column's properties, and set it as primary key. That said, entity tracking also won't work if you haven't a primary key for the entity, which for deletes means it silently doesn't updates the entity. So, make sure to review all entities and to have them all with a primary key (as I said, if it can't be on the db, then on the designer).

CasesDataContext db = new CasesDataContext();
var Cases = from c in db.Cases
join ci in db.CaseInfo on
ci.ID equals c.InfoID
where ci.CaseInfoMeta == "some value"
select new {CASE=c, INFO=ci};
my "join" linq is a bit rusty, but the above should get close to what you're after.

Is the association set to One to One or One to Many? If you have the association set to One to Many, then what you have is an EntitySet, not an EntityRef and you'll need to use a where clause on the dependent set to get the correct value. I suspect that you want a One to One relationship, which is not the default. Try changing it to One to One and see if you can construct the query.
Note: I'm just guessing because you haven't actually told us what the "trouble" actually is.

Your query looks correct and should return a query result set of Case objects.
So... what's the problem?
(Edit) My problem being that CaseInfo
is not available under Cases... i.e.
c.CaseInfo doesn't exist where I'm
assuming it would be if there were
explicit primary/foreign key
relationships.
What do you mean by "not available"? If you created the association in the designer as you say you did, then the query should generate SQL something along the lines of
SELECT [columns]
FROM Case INNER JOIN CaseInfo
ON Case.CaseID = CaseInfo.CaseID
WHERE CaseInfo.CaseInfoMeta = 'some value'
Have you debugged your linq query to get the SQL generated yet? What does it return?

Couple of things you might want to try:
Check the properties of the association. Make sure that the Parent property was created as Public. It does this by default, but something may have changed.
Since you're not getting CaseInfo on C, try typing it the other direction to see if you get ci.Case with intellisense.
Delete and recreate the association all together.
There's something very basic going wrong if the child members are not showing up. It might be best to delete the dbml and recreate the whole thing.
If all else fails, switch to NHibernate. :)

After a few tests, I'm pretty sure the FK relationships are required in the DB regardless of whatever associations are created in Linq-to-SQL. i.e. if you don't have them explicitly set in the DB, then you will have to do a join manually.

Is this c#? I think you need == instead of = on this line:
where c.CaseInfo.CaseInfoMeta = "some value"
should read
where c.CaseInfo.CaseInfoMeta == "some value"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio