JdbcMessageHandler batch update - not to cancel everything on one error - jdbc

lets say the payload() is ArrayList with n items, one of the items has a pk duplicate value (that already exists in table and causes duplicate violation).
When this happens , none of other valid items inserted to database.
Is it possible that the batch operation will insert valid (and will push to errorChannel only problematic items)?
#Bean
#ServiceActivator(inputChannel=..)
public MessageHandler jdbcMessageHandler() {
JdbcMessageHandler jdbcMessageHandler = new JdbcMessageHandler(dataSource, "INSERT INTO...");
jdbcMessageHandler.setPreparedStatementSetter((ps, message) ->...
return...
}

No, with such an assumption you just violate a purpose of batch insert: all or nothing.
You may consider to split that list upfront and add an ExpressionEvaluatingRequestHandlerAdvice into that JdbcMessageHandler to handle individual errors.
See more info about advising request handlers in docs: https://docs.spring.io/spring-integration/docs/current/reference/html/messaging-endpoints.html#message-handler-advice-chain

Related

Within a Spring batch processor, how to commit entity to Database immediately on calling repository.save()

I am creating a Spring Batch process (Spring Boot 2) that reads a file and writes it to a Database. It processes it one record at a time. Read from file, process it, and write (or update) to the Database.
If a record for the same ID exists in the DB, the process has to update the end date of the existing record in DB, and create a new record with new start date. Below is the code:
public class Processor implements ItemProcessor<CelebVO, CelebVO> {
#Autowired
EndorseTableRepository endorseTableRepository;
#Override
#Transactional
public CelebVO process(CelebVO celebVO) {
CelebEndorsement celebEndorsement = endorseTableRepository.findAllByCelebIDAndBrandID(celebVO.getCelebID(),celebVO.getBrandID());
if (celebEndorsement == null) {
CelebEndorsement newEndorsement = new CelebEndorsement(celebVO);
endorseTableRepository.save(newEndorsement);
} else {
celebEndorsement.setEndDate(celebVO.getEffDt.minusDays(1));
endorseTableRepository.save(celebEndorsement);
// create a new row with new start date
CelebEndorsement newEndorsement = new CelebEndorsement(celebVO);
newEndorsement.setStartDate(celebVO.getEffDt());
endorseTableRepository.save(newEndorsement);
}
return celebVO;
}
}
Below is the input txt file (CelebVO):
CelebID BrandID EffDt
J Lo Pepsi 2021-01-05
J Lo Pepsi 2021-05-30
Now, lets suppose we are starting with an empty EndorseTable. When the process picks up the file and reads the records, it will see there are no records for CelebID 'J Lo'. So it will insert a row to the DB.
Now, the process reads the second row and process it. It should see that there is already a record in the table for J Lo. So it should put an endDate to that records and then create a new record.
After this file is processed we should see two records in the table.
But that is not what is happening. Though I do a repository.save() for the first record, it is still not commited to the table. So when the process reads the second row, it doesn't find any rows in the table. It ends up writing only one record to the table.
I tried a repository.saveAndFlush(). That doesn't help.
My chunk size is 1
I tried removing #Transactional. But that breaks the code. So I kept it there.
The chunk-oriented processing model of Spring Batch commits a transaction per chunk, not per record. So in your case, if the insert and the update happen to be in the same chunk, the processor won't see the change of the previous record as the transaction is not committed yet at that point.
Adding #Transactional on your processor's method is incorrect, because the processor will already be executed within the scope of a transaction driven by Spring Batch. What you are trying to do would work if you set the commit interval to 1, but this would impact the performance of your step.
I had to modify the Entity class. I replaced
#ManyToOne(cascade = CascadeType.ALL)
with
#ManyToOne(cascade = {CascadeType.MERGE, CascadeType.DETACH})
and it worked.

Spring Boot JPA save() method trying to insert exisiting row

I have a simple kafka consumer that collects events and based on the data in them inserts or updates a record in the database - table has a unique ID constraint on ID column and also in the entity field.
Everything works fine when the table is pre-populated and inserts happen every now and then. However when i truncate the table and send a couple thousand events with limited number of ID (i was doing 50 unique ID within 3k events) then events are processed simultaneously and the save() method randomly fails with Unique constraint violation exception. I debugged it and the outcome is pretty simple.
event1={id = 1 ... //somedata} gets picked up, service method saveOrUpdateRecord() looks for the record by ID=1, finds none, inserts a new record.
event2={id = 1 ... //somedata} gets picked up almost at the same time, service method saveOrUpdateRecord() looks for the record by ID=1, finds none (previous one is mid-inserting), tries to insert and fails with constraint violation exception - should find this record and merge it with the input from the event based on my conditions.
How can i get the saveOrUpdateRecord() to run only when the previous one was fully executed to prevent such behaviour? I really dont want to slow kafka consumer down with poll size etc, i just want my service to execute one transaction at a time.
The service method:
public void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}
Will #Transactional annotaion on method do the job?
Make your service thread safe.
Use this:
public synchronized void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}

PageRequest and OrderBy method name Issue

in our Spring application we have a table that contains a lot of "Payment" record. Now we need a query that pages the results sorted from the one with the largest total to the smallest, we are facing an error because sometimes the same record is contained in two successive pages.
We are creating a PageRequest passed to the repository. Here our implementation:
Repository:
public interface StagingPaymentEntityRepository extends JpaRepository<StagingPaymentEntity, Long> {
Page<StagingPaymentEntity> findAllByStatusAndCreatedDateLessThanEqualAndOperationTypeOrderByEffectivePaymentDesc(String status, Timestamp batchStartTimestamp, String operationType, Pageable pageable);
}
public class BatchThreadReiteroStorni extends ThreadAbstract<StagingPaymentEntity> {
PageRequest pageRequest = PageRequest.of (index, 170);
Page<StagingPaymentEntity> records = ((StagingPaymentEntityRepository) repository).findAllByStatusAndCreatedDateLessThanEqualAndOperationTypeOrderByEffectivePaymentDesc("REITERO", batchStartTimestamp, "STORNO", pageRequest) ;
}
where index is the index of the page we are requesting.
There is a way to understand why it is happening ? Thank for support
This can have multiple reasons.
Non deterministic ordering: If the ordering you are using isn't deterministic, i.e. there are rows that might com in any order that order might change between selects resulting in items getting skipped or returned multiple times. Fix: add the primary key as a last column to the ordering.
If you change the entities in a way that affects the ordering, or another process does that you might end up with items getting processed multiple times.
In this scenario I see a couple of approaches:
do value based pagination. I.e. don't select pages but select the next N rows after .
Instead of paging use a Stream this allows to use a single select but still processing the results an element at a time. You might have to flush and evict entities and I'm not 100% sure that works, but certainly worth a try.
Finally you can mark all all rows that you want to process in a separate column, then select N marked entities and unmark them once they are processed.

Assert a specific and unique value in column

my RDBMS is PostgreSQL. I am using SpringBoot and Hibernate as JPA.
Let's consider a very simple one-column table:
Man
Age: integer
And I would like to implement a such method that add a man to the table. That method should satisfy the following condition:
At most one man in table can be 80 years old
.
#Transactional
void addMan(int age){
....
}
It looks like I need to take a exclusive lock for whole table, yes? How to do it?
There is a second solution.
Use SERIALIZABLE transactions that look like this:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM man WHERE age = 80;
Now, if the result is 0, continue:
INSERT INTO man VALUES (80);
COMMIT;
If two transactions try to do this concurrently, one will fail with a serialization error.
In that case, you just have to retry the transaction until it succeeds.
Serializable transactions perform worse than transactions with a lower isolation level, but they will still perform better than actually serializing the transactions.
I would add a unique constraint / index to the table and let the database do the work.
create unique index man_age_unique_idx on man (age);
If a record with that age does not already exist, no problem.
If it does exist, you should get back a PersistenceException where the cause is a Hibernate ConstraintViolationException. From that you can get the name of the constraint that was violated, e.g. man_age_unique_idx in the example above.
try {
entityManager.persist(man);
} catch (PersistenceException e) {
if (e.getCause() instanceof ConstraintViolationException) {
ConstraintViolationException cve = (ConstraintViolationException) e.getCause();
if (Objects.equals(cve.getConstraintName(), "man_age_unique_idx")) {
// handle as appropriate... e.g. throw some custom business exception
throw new DuplicateAgeException("Duplicate age: " + man.getAge(), e);
}
} else {
// handle other causes ...
}
}

Using "Any" or "Contains" when context not saved yet

Why isn't the exception triggered? Linq's "Any()" is not considering the new entries?
MyContext db = new MyContext();
foreach (string email in {"asdf#gmail.com", "asdf#gmail.com"})
{
Person person = new Person();
person.Email = email;
if (db.Persons.Any(p => p.Email.Equals(email))
{
throw new Exception("Email already used!");
}
db.Persons.Add(person);
}
db.SaveChanges()
Shouldn't the exception be triggered on the second iteration?
The previous code is adapted for the question, but the real scenario is the following:
I receive an excel of persons and I iterate over it adding every row as a person to db.Persons, checking their emails aren't already used in the db. The problem is when there are repeated emails in the worksheet itself (two rows with the same email)
Yes - queries (by design) are only computed against the data source. If you want to query in-memory items you can also query the Local store:
if (db.Persons.Any(p => p.Email.Equals(email) ||
db.Persons.Local.Any(p => p.Email.Equals(email) )
However - since YOU are in control of what's added to the store wouldn't it make sense to check for duplicates in your code instead of in EF? Or is this just a contrived example?
Also, throwing an exception for an already existing item seems like a poor design as well - exceptions can be expensive, and if the client does not know to catch them (and in this case compare the message of the exception) they can cause the entire program to terminate unexpectedly.
A call to db.Persons will always trigger a database query, but those new Persons are not yet persisted to the database.
I imagine if you look at the data in debug, you'll see that the new person isn't there on the second iteration. If you were to set MyContext db = new MyContext() again, it would be, but you wouldn't do that in a real situation.
What is the actual use case you need to solve? This example doesn't seem like it would happen in a real situation.
If you're comparing against the db, your code should work. If you need to prevent dups being entered, it should happen elsewhere - on the client or checking the C# collection before you start writing it to the db.

Resources