Spring Batch - Updating different tables based on conditions

Spring Batch - Updating different tables based on conditions - spring

Spring batch
Not sure how to implement this but requirements are let's say we have condition A/B:
If it's A - I have to update/delete/insert in tables D/E/F
If it's B - I have to update/delete/insert in tables G/H/I
The transaction should be done in a single transaction, meaning that under condition A, I have to finish updating all 3 tables - D, E, F. In case any table fails, this transaction shouldn't be partially done.
Was thinking of classifier + composite item writer but not sure if it's a single transaction.

Using a classifier makes sense only when items should/could be classified. So if your condition depends on item types (ie classes), then using a classifier + composite writer is the way to go. If you use a composite writer, the transaction will be around the composite writer, with all-or-nothing semantics for your insert/update/delete statements. You can find a complete example here: How does Spring Batch CompositeItemWriter manage transaction for delegate writers?
If your condition does not depend on item types (like for example a job parameter or a system property), then you can create a custom writer for that (This writer could delegate to two composite writers and call the appropriate one based on the condition).

Related

Nested transactions in Spring and Hibernate

I have a Spring Boot application with persistence using Hibernate/JPA.
I am using transactions to manage my database persistence, and I am using the #Transactional annotation to define the methods that should execute transactionally.
I have three main levels of transaction granularity when persisting:
Batches of entities to be persisted
Single entities to be persisted
Single database operations that persist an entity
Therefore, you can imagine that I have three levels of nested transactions when thinking about the whole persistence flux.
The interaction between between levels 2 and 3 works transparently as I desire because without specifying any Propagation behaviour for the transaction, the default is the REQUIRED behaviour, and so the entire entity (level 2) is rolled back because level 3 will support the transaction defined in level 2.
However, the problem is that I need an interaction between 1 and 2 that is slightly different. I need an entity to be rolled back individually if an error were to occur, but I wouldn't like the entire batch to be rolled back. That being said, I need to specify a propagation behavior in the level 2 annotation #Transactional(propagation = X) that follows these requirements.
I've tried REQUIRES_NEW but that doesn't work because it commits some of the entities from level 2 even if the whole batch had to be rolled back, which can also happen.
The behaviour that seems to fit the description better is NESTED, but that is not accepted when using Spring and Hibernate JPA, see here for more information.
This last link offers alternatives for the NESTED type, but I would like to know if NESTED would've really solved my problem, or if there was another behaviour that suited the job better.

I guess NESTED would roughly do what you want but I would question if this really is necessary. I don't know what you are trying to do or what the error condition is, but maybe you can get rid of the error condition by using some kind of WHERE clause or an UPSERT statement: Hibernate Transactions and Concurrency Using attachDirty (saveOrUpdate)

Create an ItemReader Class which gathers data from sub item readers

Currently I have multiple Item readers for individual Database queries. I want to get all of that information into a single Object. Is there a type of ItemReader which can do this for me? I'd like the Processor and Writer to handle that object after the reader reads in as a single object. So basically I need to create an object from a group of ItemReaders and use it from that point on as my only data source to process and write.
One thing to note is that I cannot change the existing ItemReaders due to it being a larger part of a project.

Even if you cannot change existing coded, I would avoid using multiple readers and instead look at using an ORM framework like Hibernate in a brand new reader?
Alternatively, create a new reader using JdbcCursorItemReader and just join the 3 (or more) tables together for your driving cursor and map the result set to your custom object. If the joins will cause the cursor to return multiple rows per object, I first would suggest to go back and look at Hibernate. If Hibernate isn't an option, you could extend the SingleItemPeekableItemReader to iterate through the cursor until you've fully built your objects.
Example using SingleItemPeekableItemReader.

Customize Spring's JdbcBatchItemWriter to use different SQL for every record

I have a requirement where I will receive a flat file from a vendor and I need to read the records and insert/update/delete them in my DB table. I get the action flag from vendor indicating whether I need to insert/update/delete that particular record. The flat file will contain huge records and I do not want to do manual steps like checking the action flag for every record [by overriding write() method of ItemWriter and looping the items list in chunk] and construct sql manually and use JDBCTemplate to do the DB operation for every record.
Can I achieve this using JdbcBatchItemWriter? Is there a way to set the sql for every record in the chunk so that Spring Batch will do a batch update? How does the ItemPreparedStatementSetter can be invoked in that case?

Since your choice is at the record level, take a look at the ClassifierCompositeItemWriter (http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/support/ClassifierCompositeItemWriter.html). That ItemWriter implementation takes a Classifier implementation that it uses to determine which ItemWriter to use. From there, you can configure one ItemWriter that does inserts, one for updates, and one for deletes. Each record will be funneled through to the correct instance and assuming your delegates are JdbcBatchItemWriters, you'll get the same batching you normally do (one batch for inserts, one for updates, and one for deletes).

Spring Batch Framework

I am not able to finalize whether Spring Batch framework is applicable for the below requirement. I need experts inputs on this.
Following is my requirement:
Read multiple Oracle tables (at least 10 tables including both transaction and master), do complex
calculation based on the business rules, Insert / Update / Delete
records in transaction tables.
I have identified the following two designs:
Design # 1:
ItemReader: Select eligible records from Key transaction table.
ItemProcessor: Fetch additional details from DB using the key available in the record retrieved by ItemReader.(It would require multipble DB transactions)
Do the validation and computation and add the details to be written to DB as objects in a list.
ItemWriter: Write the details available in objects using CustomItemWriter(insert / update / delete operation)
With this design, we can achieve parallel processing but increase the number of DB transactions.
Design # 2:
Step # 1
ItemReader: Use Composite Item Reader (Group of ItemReaders) to read all the required tables.
ItemWriter: Save the result sets as lists of Objects (One list per table) in execution context
Step # 2
ItemReader: Retrieve lists of Objects available in execution context and group them into one list of objects based on the business processing so that processor can process them.
IremProcessor:
Process the chunk of Objects returned by ItemReader.
Do the validation and computation and add the details to be written to DB as objects in a list.
ItemWriter: Write the details available in objects using CustomItemWriter(insert / update / delete operation)
With this design, we can REDUCE the number of DB Transactions but we are delaying the processing till all table records are retrieved and stored in execution context ie we are not using parallel processing provided by SpringBatch.
Please advise whether the above is feasible using SpringBatch or we need to use conventional Java program.

The good news is that your problem description matches a very common use case for spring-batch. The bad news is that the problem description is too generic to allow much meaningful input about the specifc design beyond the comments already provided.
Spring-batch brings facilities similar to JCL and ISPF from the mainframe world into the java context.
Spring batch provides a framework for organizing and managing the boundaries of your process. It is a natural for a lot of ETL and bigdata operations, but it is not the only way to write these processes.
If you process can be broken down into discreet steps, then spring batch is a good choice for you.
The Itemreader should (logicall) be an iterator returning a single object representing the start of one logical unit of work (luw). The luw object is captured by the chunker and assembled into collections of the size you configure, and then passed to the processor. The result of the processor is then passed to the writer. In the context of an RDBMS centric process, the commit happens at the end of the writer's operation.
What happens in each of those pieces of the step is 100% whatever you need (plain old java). The point of the framework is to free you from the complexity and enable you to solve the problem.

From my understanding, Spring batch has nothing to do with database batch operations (or at least the word 'batch' has a different meaning in these two contexts..) Spring batch is used to create processes with multiple steps, and gives you the chance to restart a process if one of the process steps fails (without repeating the previously finished process steps.)

How does one design a spring batch job with a data source, possible concurrent steps and aggregation in the end?

I am new to spring batching and I'm having some doubts on how to implement a use case. My experience so far with spring batching is centered around jobs composed of tasklets with reader, writer and processor. I feel though that the following use case is above my experience so here goes:
I need to read from an mdb
I need to differentiate between the entries based on a combination of column values(will yield a max of 5 combos)
Processing needs in the end to generate a collection of items of type T.
Everything needs to be merged in the end for some aggregations.
My ideea is to avoid reading the mdb multiple times, so I was looking into a way of splitting the data based on combos and then run, maybe concurrently, the processes. Having this in mind I read about the Splitter and partitioning components from spring batching and integration.
What I don't exactly know is how to put all concepts toghether.

What do you mean by MDB? MessageDrivenBean? If the answer if yes - what do you mean by reading from MDB multiple times? Since MDBs are message-driven, we can't read from them at any time, so basis on my understanding of your question I'd do it in the following way:
MDB receives message and stores received entry in some DB table - that would be some kind of transition table; such tables are often used during processing of financial transactions
Batch window comes - job is triggered.
Now you can query the table in any way you want. Since you are looking for splitting and processing the data concurrently, I'd advice using Spring Batch partitioning with TaskExecutorPartitionHandler executing step locally in concurrent threads. What you need to do is to read data from database differentiating on combination of column values - that should be relatively easy - it's just a matter of constructing appropriate SQL query.
Processed chunks are aggregated into ItemWriter write(List<? extends T> items) depending on commit interval; if such aggregation is not enough for you, I'd add another table and Batch step that aggregates previously processed entries.
Basically that's how batch processing works - you read items, transforms them and write. The next step - if it's not just a simple tasklet - does exactly the same.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio