Spring Boot JPA save() method trying to insert exisiting row - spring-boot

I have a simple kafka consumer that collects events and based on the data in them inserts or updates a record in the database - table has a unique ID constraint on ID column and also in the entity field.
Everything works fine when the table is pre-populated and inserts happen every now and then. However when i truncate the table and send a couple thousand events with limited number of ID (i was doing 50 unique ID within 3k events) then events are processed simultaneously and the save() method randomly fails with Unique constraint violation exception. I debugged it and the outcome is pretty simple.
event1={id = 1 ... //somedata} gets picked up, service method saveOrUpdateRecord() looks for the record by ID=1, finds none, inserts a new record.
event2={id = 1 ... //somedata} gets picked up almost at the same time, service method saveOrUpdateRecord() looks for the record by ID=1, finds none (previous one is mid-inserting), tries to insert and fails with constraint violation exception - should find this record and merge it with the input from the event based on my conditions.
How can i get the saveOrUpdateRecord() to run only when the previous one was fully executed to prevent such behaviour? I really dont want to slow kafka consumer down with poll size etc, i just want my service to execute one transaction at a time.
The service method:
public void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}
Will #Transactional annotaion on method do the job?

Make your service thread safe.
Use this:
public synchronized void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}

Related

Within a Spring batch processor, how to commit entity to Database immediately on calling repository.save()

I am creating a Spring Batch process (Spring Boot 2) that reads a file and writes it to a Database. It processes it one record at a time. Read from file, process it, and write (or update) to the Database.
If a record for the same ID exists in the DB, the process has to update the end date of the existing record in DB, and create a new record with new start date. Below is the code:
public class Processor implements ItemProcessor<CelebVO, CelebVO> {
#Autowired
EndorseTableRepository endorseTableRepository;
#Override
#Transactional
public CelebVO process(CelebVO celebVO) {
CelebEndorsement celebEndorsement = endorseTableRepository.findAllByCelebIDAndBrandID(celebVO.getCelebID(),celebVO.getBrandID());
if (celebEndorsement == null) {
CelebEndorsement newEndorsement = new CelebEndorsement(celebVO);
endorseTableRepository.save(newEndorsement);
} else {
celebEndorsement.setEndDate(celebVO.getEffDt.minusDays(1));
endorseTableRepository.save(celebEndorsement);
// create a new row with new start date
CelebEndorsement newEndorsement = new CelebEndorsement(celebVO);
newEndorsement.setStartDate(celebVO.getEffDt());
endorseTableRepository.save(newEndorsement);
}
return celebVO;
}
}
Below is the input txt file (CelebVO):
CelebID BrandID EffDt
J Lo Pepsi 2021-01-05
J Lo Pepsi 2021-05-30
Now, lets suppose we are starting with an empty EndorseTable. When the process picks up the file and reads the records, it will see there are no records for CelebID 'J Lo'. So it will insert a row to the DB.
Now, the process reads the second row and process it. It should see that there is already a record in the table for J Lo. So it should put an endDate to that records and then create a new record.
After this file is processed we should see two records in the table.
But that is not what is happening. Though I do a repository.save() for the first record, it is still not commited to the table. So when the process reads the second row, it doesn't find any rows in the table. It ends up writing only one record to the table.
I tried a repository.saveAndFlush(). That doesn't help.
My chunk size is 1
I tried removing #Transactional. But that breaks the code. So I kept it there.
The chunk-oriented processing model of Spring Batch commits a transaction per chunk, not per record. So in your case, if the insert and the update happen to be in the same chunk, the processor won't see the change of the previous record as the transaction is not committed yet at that point.
Adding #Transactional on your processor's method is incorrect, because the processor will already be executed within the scope of a transaction driven by Spring Batch. What you are trying to do would work if you set the commit interval to 1, but this would impact the performance of your step.
I had to modify the Entity class. I replaced
#ManyToOne(cascade = CascadeType.ALL)
with
#ManyToOne(cascade = {CascadeType.MERGE, CascadeType.DETACH})
and it worked.

Spring Boot Manual Acknowledgement of kafka messages is not working

I have a spring boot kafka consumer which consume data from a topic and store it in a Database and acknowledge it once stored.
It is working fine but the problem is happening if the application failed to get the DB connection after consuming the record ,in this case we are not sending the acknowledgement but still the message never consumed until or unless we change the group id and restart the consumer
My consumer looks like below
#KafkaListener(id = "${group.id}", topics = {"${kafka.edi.topic}"})
public void onMessage(ConsumerRecord record, Acknowledgment acknowledgment) {
boolean shouldAcknowledge = false;
try {
String tNo = getTrackingNumber((String) record.key());
log.info("Check Duplicate By Comparing With DB records");
if (!ediRecordService.isDuplicate(tNo)) {---this checks the record in my DB
shouldAcknowledge = insertEDIRecord(record, tNo); --this return true
} else {
log.warn("Duplicate record found.");
shouldAcknowledge = true;
}
if (shouldAcknowledge) {
acknowledgment.acknowledge();
}```
So if you see the above snippet we did not sent acknowledgment.
That is not how kafka offset works here
The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
From the above statement For example, from the first poll consumer get the message at offset 300 and if it failed to persist into database because of some issue and it will not submit the offset.
So in the next poll it will get the next record where offset is 301 and if it persist data into database successfully then it will commit the offset 301 (which means all records in that partitions are processed till that offset, in above example it is 301)
Solution for this : use retry mechanism until it successfully stores data into database with some limited retries or just save failed data into error topic and reprocess it later, or save the offset of failed records somewhere so later you can reprocess them.

Assert a specific and unique value in column

my RDBMS is PostgreSQL. I am using SpringBoot and Hibernate as JPA.
Let's consider a very simple one-column table:
Man
Age: integer
And I would like to implement a such method that add a man to the table. That method should satisfy the following condition:
At most one man in table can be 80 years old
.
#Transactional
void addMan(int age){
....
}
It looks like I need to take a exclusive lock for whole table, yes? How to do it?
There is a second solution.
Use SERIALIZABLE transactions that look like this:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM man WHERE age = 80;
Now, if the result is 0, continue:
INSERT INTO man VALUES (80);
COMMIT;
If two transactions try to do this concurrently, one will fail with a serialization error.
In that case, you just have to retry the transaction until it succeeds.
Serializable transactions perform worse than transactions with a lower isolation level, but they will still perform better than actually serializing the transactions.
I would add a unique constraint / index to the table and let the database do the work.
create unique index man_age_unique_idx on man (age);
If a record with that age does not already exist, no problem.
If it does exist, you should get back a PersistenceException where the cause is a Hibernate ConstraintViolationException. From that you can get the name of the constraint that was violated, e.g. man_age_unique_idx in the example above.
try {
entityManager.persist(man);
} catch (PersistenceException e) {
if (e.getCause() instanceof ConstraintViolationException) {
ConstraintViolationException cve = (ConstraintViolationException) e.getCause();
if (Objects.equals(cve.getConstraintName(), "man_age_unique_idx")) {
// handle as appropriate... e.g. throw some custom business exception
throw new DuplicateAgeException("Duplicate age: " + man.getAge(), e);
}
} else {
// handle other causes ...
}
}

Spring data Neo4j Affected row count

Considering a Spring Boot, neo4j environment with Spring-Data-neo4j-4 I want to make a delete and get an error message when it fails to delete.
My problem is since the Repository.delete() returns void I have no ideia if the delete modified anything or not.
First question: is there any way to get the last query affected lines? for example in plsql I could do SQL%ROWCOUNT
So anyway, I tried the following code:
public void deletesomething(Long somethingId) {
somethingRepository.delete(getExistingsomething(somethingId).getId());
}
private something getExistingsomething(Long somethingId, int depth) {
return Optional.ofNullable(somethingRepository.findOne(somethingId, depth))
.orElseThrow(() -> new somethingNotFoundException(somethingId));
}
In the code above I query the database to check if the value exist before I delete it.
Second question: do you recommend any different approach?
So now, just to add some complexity, I have a cluster database and db1 can only Create, Update and Delete, and db2 and db3 can only Read (this is ensured by the cluster sockets). db2 and db3 will receive the data from db1 from the replication process.
For what I seen so far replication can take up to 90s and that means that up to 90s the database will have a different state.
Looking again to the code above:
public void deletesomething(Long somethingId) {
somethingRepository.delete(getExistingsomething(somethingId).getId());
}
in debug that means:
getExistingsomething(somethingId).getId() // will hit db2
somethingRepository.delete(...) // will hit db1
and so if replication has not inserted the value in db2 this code wil throw the exception.
the second question is: without changing those sockets is there any way for me to delete and give the correct response?
This is not currently supported in Spring Data Neo4j, if you wish please open a feature request.
In the meantime, perhaps the easiest work around is to fall down to the OGM level of abstraction.
Create a class that is injected with org.neo4j.ogm.session.Session
Use the following method on Session
Example: (example is in Kotlin, which was on hand)
fun deleteProfilesByColor(color : String)
{
var query = """
MATCH (n:Profile {color: {color}})
DETACH DELETE n;
"""
val params = mutableMapOf(
"color" to color
)
val result = session.query(query, params)
val statistics = result.queryStatistics() //Use these!
}

Using "Any" or "Contains" when context not saved yet

Why isn't the exception triggered? Linq's "Any()" is not considering the new entries?
MyContext db = new MyContext();
foreach (string email in {"asdf#gmail.com", "asdf#gmail.com"})
{
Person person = new Person();
person.Email = email;
if (db.Persons.Any(p => p.Email.Equals(email))
{
throw new Exception("Email already used!");
}
db.Persons.Add(person);
}
db.SaveChanges()
Shouldn't the exception be triggered on the second iteration?
The previous code is adapted for the question, but the real scenario is the following:
I receive an excel of persons and I iterate over it adding every row as a person to db.Persons, checking their emails aren't already used in the db. The problem is when there are repeated emails in the worksheet itself (two rows with the same email)
Yes - queries (by design) are only computed against the data source. If you want to query in-memory items you can also query the Local store:
if (db.Persons.Any(p => p.Email.Equals(email) ||
db.Persons.Local.Any(p => p.Email.Equals(email) )
However - since YOU are in control of what's added to the store wouldn't it make sense to check for duplicates in your code instead of in EF? Or is this just a contrived example?
Also, throwing an exception for an already existing item seems like a poor design as well - exceptions can be expensive, and if the client does not know to catch them (and in this case compare the message of the exception) they can cause the entire program to terminate unexpectedly.
A call to db.Persons will always trigger a database query, but those new Persons are not yet persisted to the database.
I imagine if you look at the data in debug, you'll see that the new person isn't there on the second iteration. If you were to set MyContext db = new MyContext() again, it would be, but you wouldn't do that in a real situation.
What is the actual use case you need to solve? This example doesn't seem like it would happen in a real situation.
If you're comparing against the db, your code should work. If you need to prevent dups being entered, it should happen elsewhere - on the client or checking the C# collection before you start writing it to the db.

Resources