Customize Spring's JdbcBatchItemWriter to use different SQL for every record - spring

I have a requirement where I will receive a flat file from a vendor and I need to read the records and insert/update/delete them in my DB table. I get the action flag from vendor indicating whether I need to insert/update/delete that particular record. The flat file will contain huge records and I do not want to do manual steps like checking the action flag for every record [by overriding write() method of ItemWriter and looping the items list in chunk] and construct sql manually and use JDBCTemplate to do the DB operation for every record.
Can I achieve this using JdbcBatchItemWriter? Is there a way to set the sql for every record in the chunk so that Spring Batch will do a batch update? How does the ItemPreparedStatementSetter can be invoked in that case?

Since your choice is at the record level, take a look at the ClassifierCompositeItemWriter (http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/support/ClassifierCompositeItemWriter.html). That ItemWriter implementation takes a Classifier implementation that it uses to determine which ItemWriter to use. From there, you can configure one ItemWriter that does inserts, one for updates, and one for deletes. Each record will be funneled through to the correct instance and assuming your delegates are JdbcBatchItemWriters, you'll get the same batching you normally do (one batch for inserts, one for updates, and one for deletes).

Related

Spring batch fetch huge amount of data from DB-A and store them in DB-B

I have the following scenario. In a database A I have a table with huge amount of records (several millions); these records increase day by day very rapidly (also 100.000 records at day).
I need to fetch these records, check if these records are valid and import them in my own database. At the first interaction I should take all the stored records. Then I can take only the new records saved. I have a timestamp column I can use for this filter but I can't figure how to create a JpaPagingItemReader or a JdbcPagingItemReader and pass the dynamic filter based on the date (e.g. select all records where timestamp is greater than job last execution date)
I'm using spring boot, spring data jpa and spring batch.I'm configuring the Job instance in chunks with dimension 1000. I can also use a paging query (is it useful if I use chunks?)
I have a micro service (let's call this MSA) with all the business logic needed to check if records are valid and insert the valid records.
I have another service on a separate server. This service contains all the batch operation (let's call this MSB).
I'm wondering what is the best approach to the batch. I was thinking to these solutions:
in MSB I duplicate all the entities, repositories and services I use in the MSA. Then in MSB I can make all needed queries
in MSA I create all the rest API needed. The ItemProcessor of MSB will call these rest API to perform checks on items to be processed and finally in the ItemWriter I'll call the rest API for saving data
The first solution would avoid the http calls but it forces me to duplicate all repositories and services between the 2 micro services. Sadly I can't use a common project where to place all the common objects.
The second solution, on the other hand, would avoid the code duplication but it would imply a lot of http calls (above all in the ItemProcessor to check if an item is valid or less).
Do you have any other suggestion? Is there a better approach?
Thank you
Angelo

Spring Batch Framework

I am not able to finalize whether Spring Batch framework is applicable for the below requirement. I need experts inputs on this.
Following is my requirement:
Read multiple Oracle tables (at least 10 tables including both transaction and master), do complex
calculation based on the business rules, Insert / Update / Delete
records in transaction tables.
I have identified the following two designs:
Design # 1:
ItemReader: Select eligible records from Key transaction table.
ItemProcessor: Fetch additional details from DB using the key available in the record retrieved by ItemReader.(It would require multipble DB transactions)
Do the validation and computation and add the details to be written to DB as objects in a list.
ItemWriter: Write the details available in objects using CustomItemWriter(insert / update / delete operation)
With this design, we can achieve parallel processing but increase the number of DB transactions.
Design # 2:
Step # 1
ItemReader: Use Composite Item Reader (Group of ItemReaders) to read all the required tables.
ItemWriter: Save the result sets as lists of Objects (One list per table) in execution context
Step # 2
ItemReader: Retrieve lists of Objects available in execution context and group them into one list of objects based on the business processing so that processor can process them.
IremProcessor:
Process the chunk of Objects returned by ItemReader.
Do the validation and computation and add the details to be written to DB as objects in a list.
ItemWriter: Write the details available in objects using CustomItemWriter(insert / update / delete operation)
With this design, we can REDUCE the number of DB Transactions but we are delaying the processing till all table records are retrieved and stored in execution context ie we are not using parallel processing provided by SpringBatch.
Please advise whether the above is feasible using SpringBatch or we need to use conventional Java program.
The good news is that your problem description matches a very common use case for spring-batch. The bad news is that the problem description is too generic to allow much meaningful input about the specifc design beyond the comments already provided.
Spring-batch brings facilities similar to JCL and ISPF from the mainframe world into the java context.
Spring batch provides a framework for organizing and managing the boundaries of your process. It is a natural for a lot of ETL and bigdata operations, but it is not the only way to write these processes.
If you process can be broken down into discreet steps, then spring batch is a good choice for you.
The Itemreader should (logicall) be an iterator returning a single object representing the start of one logical unit of work (luw). The luw object is captured by the chunker and assembled into collections of the size you configure, and then passed to the processor. The result of the processor is then passed to the writer. In the context of an RDBMS centric process, the commit happens at the end of the writer's operation.
What happens in each of those pieces of the step is 100% whatever you need (plain old java). The point of the framework is to free you from the complexity and enable you to solve the problem.
From my understanding, Spring batch has nothing to do with database batch operations (or at least the word 'batch' has a different meaning in these two contexts..) Spring batch is used to create processes with multiple steps, and gives you the chance to restart a process if one of the process steps fails (without repeating the previously finished process steps.)

Spring Batch: what is the best way to use, the data retrieved in one TaskletStep, in the processing of another step

I have a job in which:
The first step is a TaskletStep which retrieves some records(approx. 150-200) from a database table into a list.
The second step retrieves data from some other table and requires the list of records retrieved in the previous step for processing.
I came across three ways to do this:
1)putting the list retrieved in first step in StepExecutionContext and then promoting it to JobExecutionContext to share data between steps.
2)using spring's caching concept i.e. using #cacheable
3)programmatically putting the list in the ApplicationContext
What is the best way to achieve this(it would be better if it can be explained with an example), keeping in mind two main concerns:
if the volume of data retrieved in the first step increases and performance
Remember that objects in step context are stored into database,so you must be sure that objects are serializable and are really a few.
If you are sure, put objects in your jobExecutionContext (as solution 1.) or use a bean holder (Passing data to future step); this type of approach is valid ONLY if data in first step is SMALL.
Else, you can process data in step2 without data retrivial in step1, but easly manage a cache of step1 data while processing data in step2; in this way you don't need step1, don't need to store step1 data to database, but step1 data lookup while processing millions record in step2 doesn't impact in terms of time processing.
I hope I was clear, English is not my language

How does one design a spring batch job with a data source, possible concurrent steps and aggregation in the end?

I am new to spring batching and I'm having some doubts on how to implement a use case. My experience so far with spring batching is centered around jobs composed of tasklets with reader, writer and processor. I feel though that the following use case is above my experience so here goes:
I need to read from an mdb
I need to differentiate between the entries based on a combination of column values(will yield a max of 5 combos)
Processing needs in the end to generate a collection of items of type T.
Everything needs to be merged in the end for some aggregations.
My ideea is to avoid reading the mdb multiple times, so I was looking into a way of splitting the data based on combos and then run, maybe concurrently, the processes. Having this in mind I read about the Splitter and partitioning components from spring batching and integration.
What I don't exactly know is how to put all concepts toghether.
What do you mean by MDB? MessageDrivenBean? If the answer if yes - what do you mean by reading from MDB multiple times? Since MDBs are message-driven, we can't read from them at any time, so basis on my understanding of your question I'd do it in the following way:
MDB receives message and stores received entry in some DB table - that would be some kind of transition table; such tables are often used during processing of financial transactions
Batch window comes - job is triggered.
Now you can query the table in any way you want. Since you are looking for splitting and processing the data concurrently, I'd advice using Spring Batch partitioning with TaskExecutorPartitionHandler executing step locally in concurrent threads. What you need to do is to read data from database differentiating on combination of column values - that should be relatively easy - it's just a matter of constructing appropriate SQL query.
Processed chunks are aggregated into ItemWriter write(List<? extends T> items) depending on commit interval; if such aggregation is not enough for you, I'd add another table and Batch step that aggregates previously processed entries.
Basically that's how batch processing works - you read items, transforms them and write. The next step - if it's not just a simple tasklet - does exactly the same.

best way to deal with multiple transaction to the database with entity framework

Bit of advice really, i am building an MVC application that takes in feeds for products from multiple sources. This can run into millions and despite my best advice for the client to split all his feeds into smaller chunks, I know they will probably try and do a thousand at a go.
Now the main problem is that I don't want to loop through every xml record and do an insert.
what i would rather do is queue a stack off inserts and then fly them into the database in one massive transaction. Very much like a database SQL import of a whole table.
Is this possible? if so how or what do they call it?
also, if I did want to re-insert repeated products again and again, when nothing has changed, what would be the best practice for this. could I maybe loop through an already fetched dataset?
I'm not sure what is best to do here, so ask the people, what is the consensus when it comes to a scenario like this.
thanks
With the entity framework you will get a single db insert per record you are inserting, there will be no bulk insert (if that is what you were looking for).
However to enclose this in a transaction, you need to do nothing but add your item to the context class.
http://msdn.microsoft.com/en-us/library/bb336792.aspx
This will automatically put in a transaction when you call SaveChanges. All you need to do is ensure you use a single context class and .Add(yourObject) to the context.
So just wait to call SaveChanges until all of the objects have been added to the context.

Resources