How to implement chunk processing using custom ItemReader - spring

I am using Spring batch 2.1.9.RELEASE
I need to configure a job-step which reads the data from Mysql DB, process it and write back to Mysql. I want to do it in chunks.
I considered using JdbcCursorItemReader but the SQL is a complex one. I need to fetch data from three other tables to create the actual SQL to use in the reader.
But if I use a customItemReader with JdbcTemplate/NamedParameterJdbcTemplate, how can i make sure the step processes the data in chunk? I am not using JPA/DAOs.
Many thanks,

In Spring-batch data are normally processed as chunk; the easy way is to declare a commit-interval in step definition; see Configuring a step.
Another way to define a custom chunk policy is to implements your own CompletionPolicy.
To answer your question use the Driving Query Based ItemReaders to read from main table and build complex object (reading from other tables), define a commit-interval and use the standard read/process/write step pattern.
I hope I was clear, English is not my language.

Related

Create an ItemReader Class which gathers data from sub item readers

Currently I have multiple Item readers for individual Database queries. I want to get all of that information into a single Object. Is there a type of ItemReader which can do this for me? I'd like the Processor and Writer to handle that object after the reader reads in as a single object. So basically I need to create an object from a group of ItemReaders and use it from that point on as my only data source to process and write.
One thing to note is that I cannot change the existing ItemReaders due to it being a larger part of a project.
Even if you cannot change existing coded, I would avoid using multiple readers and instead look at using an ORM framework like Hibernate in a brand new reader?
Alternatively, create a new reader using JdbcCursorItemReader and just join the 3 (or more) tables together for your driving cursor and map the result set to your custom object. If the joins will cause the cursor to return multiple rows per object, I first would suggest to go back and look at Hibernate. If Hibernate isn't an option, you could extend the SingleItemPeekableItemReader to iterate through the cursor until you've fully built your objects.
Example using SingleItemPeekableItemReader.

Spring Batch Framework

I am not able to finalize whether Spring Batch framework is applicable for the below requirement. I need experts inputs on this.
Following is my requirement:
Read multiple Oracle tables (at least 10 tables including both transaction and master), do complex
calculation based on the business rules, Insert / Update / Delete
records in transaction tables.
I have identified the following two designs:
Design # 1:
ItemReader: Select eligible records from Key transaction table.
ItemProcessor: Fetch additional details from DB using the key available in the record retrieved by ItemReader.(It would require multipble DB transactions)
Do the validation and computation and add the details to be written to DB as objects in a list.
ItemWriter: Write the details available in objects using CustomItemWriter(insert / update / delete operation)
With this design, we can achieve parallel processing but increase the number of DB transactions.
Design # 2:
Step # 1
ItemReader: Use Composite Item Reader (Group of ItemReaders) to read all the required tables.
ItemWriter: Save the result sets as lists of Objects (One list per table) in execution context
Step # 2
ItemReader: Retrieve lists of Objects available in execution context and group them into one list of objects based on the business processing so that processor can process them.
IremProcessor:
Process the chunk of Objects returned by ItemReader.
Do the validation and computation and add the details to be written to DB as objects in a list.
ItemWriter: Write the details available in objects using CustomItemWriter(insert / update / delete operation)
With this design, we can REDUCE the number of DB Transactions but we are delaying the processing till all table records are retrieved and stored in execution context ie we are not using parallel processing provided by SpringBatch.
Please advise whether the above is feasible using SpringBatch or we need to use conventional Java program.
The good news is that your problem description matches a very common use case for spring-batch. The bad news is that the problem description is too generic to allow much meaningful input about the specifc design beyond the comments already provided.
Spring-batch brings facilities similar to JCL and ISPF from the mainframe world into the java context.
Spring batch provides a framework for organizing and managing the boundaries of your process. It is a natural for a lot of ETL and bigdata operations, but it is not the only way to write these processes.
If you process can be broken down into discreet steps, then spring batch is a good choice for you.
The Itemreader should (logicall) be an iterator returning a single object representing the start of one logical unit of work (luw). The luw object is captured by the chunker and assembled into collections of the size you configure, and then passed to the processor. The result of the processor is then passed to the writer. In the context of an RDBMS centric process, the commit happens at the end of the writer's operation.
What happens in each of those pieces of the step is 100% whatever you need (plain old java). The point of the framework is to free you from the complexity and enable you to solve the problem.
From my understanding, Spring batch has nothing to do with database batch operations (or at least the word 'batch' has a different meaning in these two contexts..) Spring batch is used to create processes with multiple steps, and gives you the chance to restart a process if one of the process steps fails (without repeating the previously finished process steps.)

BIRT Scripted Data Source using existing JDBC DataSource

I know that my overall problem is generally approached using two of the more common solutions such as a join data set or a sub-table, sub-report. I have looked at those and I am not sure this will work effectively.
Background:
JDBC data source has local data which includes a series of id's that reference a record in a master data repository interfaced via a web service. This is where the need for a scripted data source arises. The data can be filtered on either attributes within the local JDBC data and/or the extended data from the web service. The complication is that my only interface is the id argument to the webservice.
Ideal Solution:
Aside from creating a reporting table or other truly desirable scenarios I am looking to creating a unified data source through a single scripting data source that will handle all the complexities. This leaves the report generation and parameter creation a bit cleaner, hopefully. The idea is to leverage the JDBC query as well as the web service queries in the scripted data source do the filtering and joins and create that singular unified view.
I tried using the following code as a reference to use the existing JDBC connection in the BIRT report definition to execute the query. However if I think my breakdown on what should be in open vs fetch given this came from beforeFactory for a completely different purpose may be giving me errors...truth is I see no errors it just returns 0 records.
a link
I have also found a code snippet to dynamically load a JDBC connection but that seems a bit obtuse and a ton of overhead for what I am needing to do. a link
In short: How in all-that-is-holy do you simply run a query against a database within a scripted data source if you wanted to do. The merit of doing that is another issue, but technically how?
Thanks in Advance!

How does one design a spring batch job with a data source, possible concurrent steps and aggregation in the end?

I am new to spring batching and I'm having some doubts on how to implement a use case. My experience so far with spring batching is centered around jobs composed of tasklets with reader, writer and processor. I feel though that the following use case is above my experience so here goes:
I need to read from an mdb
I need to differentiate between the entries based on a combination of column values(will yield a max of 5 combos)
Processing needs in the end to generate a collection of items of type T.
Everything needs to be merged in the end for some aggregations.
My ideea is to avoid reading the mdb multiple times, so I was looking into a way of splitting the data based on combos and then run, maybe concurrently, the processes. Having this in mind I read about the Splitter and partitioning components from spring batching and integration.
What I don't exactly know is how to put all concepts toghether.
What do you mean by MDB? MessageDrivenBean? If the answer if yes - what do you mean by reading from MDB multiple times? Since MDBs are message-driven, we can't read from them at any time, so basis on my understanding of your question I'd do it in the following way:
MDB receives message and stores received entry in some DB table - that would be some kind of transition table; such tables are often used during processing of financial transactions
Batch window comes - job is triggered.
Now you can query the table in any way you want. Since you are looking for splitting and processing the data concurrently, I'd advice using Spring Batch partitioning with TaskExecutorPartitionHandler executing step locally in concurrent threads. What you need to do is to read data from database differentiating on combination of column values - that should be relatively easy - it's just a matter of constructing appropriate SQL query.
Processed chunks are aggregated into ItemWriter write(List<? extends T> items) depending on commit interval; if such aggregation is not enough for you, I'd add another table and Batch step that aggregates previously processed entries.
Basically that's how batch processing works - you read items, transforms them and write. The next step - if it's not just a simple tasklet - does exactly the same.

converting J2EE App from Sql to Oracle - suggestions with effecient approach

We have a J2EE app built on Struts2+spring+iBatis; not all DAO's use iBatis...some code still uses the old JDBC approach of interacting with Database. All our DAO's call Stored Procedures, we do not have any inline SQL. Since Oracle Stored Procedures return cursors, we have to drastically change our code.
It is fairly easy for us to convert current iBatis mappings (in sql) to oracle (used a groovy script to do this) also it is easy to convert Java code that was calling old mappings that were in sql.
Our problem is to convert the old DAO's that still use JDBC approach. Since we will have to modify them anyways (because we are now using oracle) we are thinking about converting them to iBatis mappings. is this a good approach? This will be a huge effort from our side...
what do you think will be the best approach to tackle this huge effort?
should we just get to work and start converting each method in every DAO
should we try to make some small script that looks at each method, parses out relevant information and makes iBatis mappings from that.
for maintenance and seperation purpose should we have 1 iBatis mapping for each DAO
I appologize if the question is vague but am just looking for someone who has gone through this type of thing before and has some pointers or 'lessons learned'.
The first thing you should do is cover your DAO layer in tests. This way you'll know if you broke something during the conversion. If you are moving a stored procedure from one DBMS to Oracle, you should also write tests for that using a framework like DbUnit.
You should have a TEST DB instance populated with sample data that doesn't change. You should be able to refresh this DB with the same set of sample data after your are done running your tests. This will ensure your TEST DB is in a known state. You will then have your input parameters paired with some expected (correct) result. Your test will read in these pairs and execute them against the test DB instance and confirm the expected result is returned. Assuming your tests mutate the DB, you'll want to refresh the DB between runs of your test suite.
Second, if you're already going in and changing some data access implementations for Oracle, why not use this as an opportunity to move some of that business logic out of the DB and into Java? There are many well-documented problems with maintaining large codebases in a DBMS.
should we try to make some small script that looks at each method, parses out relevant information and makes iBatis mappings from that.
I don't recommend this. The time you'd spend tweaking the script for each special case, plus hunting down all the bugs it would introduce would be better spent doing the conversion by a thinking human.
for maintenance and seperation purpose should we have 1 iBatis mapping for each DAO
That's a fine idea. You can then combine them in your sqlMapConfig with
<sqlMap resource="sqlMaps/XXX.xml" />
This will keep your mappings more manageable. Just make sure to specify the namespace attribute in each sqlMap like:
<sqlMap namespace="User">
So that you can reuse mappings between the sqlMaps for instantiating object graphs (example: when loading a User and his Permissions, the User.xml sqlMap calls the Permission.xml mapping).
All our DAO's call Stored Procedures
I don't see what iBatis is buying you here.
It's also not clear what the migration is. Are you saying that you've decided to move all the code into stored procedures, so there's no more in-line SQL? If that's the case, I'd say don't use iBatis. If you're already using Spring, let it call into Oracle using its StoredProcedure object and map the cursors into objects.
The recommendation to create JUnit or, better yet, TestNG tests is spot on. Do that before changing anything.

Resources