I am using spring batch and as normally used I have reader , processor and writer .
I have 2 questions
1>
Reader queries all 200 records (total record size in table is 200 and I have given pagesize=200 )and thus it gets me all 200 records, and in processor we want list of all these record because we have to compare each record with other 199 records to group them in different tiers .
Thus I am thinking if we can get that list in processing step , I can manipulate them .how should I approach .
2>
In processing stage I need some master data from database depending on which all input records will be processed .i m thinking of injection of data source in processing bean and fetch all master table data and process all records. Is it good approach or please suggest otherwise .
<job id="sampleJob">
<step id="step1">
<tasklet>
<chunk reader="itemReader" processor="processor" writer="itemWriter" commit-interval="20"/>
</tasklet>
</step>
</job>
And the processor is
#Override
public User process(Object item) throws Exception {
// transform item to user
return user;
}
And I want something like
public List<User> process(List<Object> item) throws Exception {
// transform item to user
return user;
}
I found some post here but they say to get the list in writer .But i dont like to process anything in writer, because that kills the defination of writer and processor. Is there any configuration to get the list inside this process method.
Thank you
Since the ItemProcessor receives whatever you return from the ItemReader, you need your ItemReader to return the List. That List is really the "item" you're processing. There is an example of this in the Spring Batch Samples. The AggregateItemReader reads all the items from a delegate ItemReader and returns them as a single list. You can take a look at it on Github here: https://github.com/spring-projects/spring-batch/blob/master/spring-batch-samples/src/main/java/org/springframework/batch/sample/domain/multiline/AggregateItemReader.java
Related
Spring Batch is designed to read and process one item at a time, then write the list of all items processed in a chunk. I want my item to be a List<T> as well, to be thus read and processed, and then write a List<List<T>>. My data source is a standard Spring JpaRepository<T, ID>.
My question is whether there are some standard solutions for this "aggregated" approach. I see that there are some, but they don't read from a JpaRepository, like:
https://github.com/spring-projects/spring-batch/blob/main/spring-batch-samples/src/main/java/org/springframework/batch/sample/domain/multiline/AggregateItemReader.java
Spring Batch - Item Reader and ItemProcessor with a list
Spring Batch- how to pass list of multiple items from input to ItemReader, ItemProcessor and ItemWriter
Update:
I'm looking for a solution that would work for a rapidly changing dataset and in a multithreading environment.
I want my item to be a List as well, to be thus read and processed, and then write a List<List>.
Spring Batch does not (and should not) be aware of what an "item" is. It is up to you do design what an "item" is and how it is implemented (a single value, a list, a stream , etc). In your case, you can encapsulate the List<T> in a type that could be used as an item, and process data as needed. You would need a custom item reader though.
The solution we found is to use a custom aggregate reader as suggested here, which accumulates the read data into a list of a given size then passes it along. For our specific use case, we read data using a JpaPagingItemReader. The relevant part is:
public List<T> read() throws Exception {
ResultHolder holder = new ResultHolder();
// read until no more results available or aggregated size is reached
while (!itemReaderExhausted && holder.getResults().size() < aggregationSize) {
process(itemReader.read(), holder);
}
if (CollectionUtils.isEmpty(holder.getResults())) {
return null;
}
return holder.getResults();
}
private void process(T readValue, ResultHolder resultHolder) {
if (readValue == null) {
itemReaderExhausted = true;
return;
}
resultHolder.addResult(readValue);
}
In order to account for the volatility of the dataset, we extended the JPA reader and overwritten the getPage() method to always return 0, and controlled the dataset through the processor and writer to have the next fresh data to be fetched always on the first page. The hint was given here and in some other SO answers.
public int getPage() {
return 0;
}
I have a Spring Batch reader with following configurations.
This reader is reading from the database and and at a time its reading a page size records.
#Autowired
private SomeCreditRepot someCreditRepo;
public RepositoryItemReader<SomeCreditModel> reader() {
RepositoryItemReader<SomeCreditModel> reader = new RepositoryItemReader<>();
reader.setRepository(someCreditRepo);
reader.setMethodName("someCreditTransfer");
.
.
..
return reader;
}
I want to call utils method,
refValue = BatchProcessingUtil.generateSomeRefValue();
before the processor step, so that all the records fetched by the reader will have the same value set by which is given by the above call.
So that all the entity fetched by the reader will get the same value, in the processor.
And then this refValue will be written to another table StoreRefValue(table).
What is the right way to do this in Spring Batch?
Should I fire the query to write the refValue, to the table StoreRefValue in the processor?
You can let your processor implement the interface StepExecutionListener. You'll then have to implement the methods afterStep and beforeStep. The first should simply return null, and in beforeStep you can call the utility method and save its return value.
Alternatively, you can use the annotation #BeforeStep. If you use the usual Java DSL, it's not required to explicitly add the processor as a listener to the step. Adding it as a processor should suffice.
There are more details in the reference documentation:
https://docs.spring.io/spring-batch/docs/current/reference/html/step.html#interceptingStepExecution
I created a Spring Batch job which reads orders from MongoDB and makes a rest call to upload them. However, the batch job automatically gets completed even though all records are not read by MongoItemReader.
I am maintaining a field batchProcessed:boolean on Orders collection. The MongoItemReader reads records for which {batchProcessed:{$ne:true}} as I need to run the batch job multiple times but not process the same documents again and again.
In my OrderWriter I set batchProcessed to true.
#Bean
#StepScope
public MongoItemReader<Order> orderReader() {
MongoItemReader<Order> reader = new MongoItemReader<>();
reader.setTemplate(mongoTempate);
HashMap<String,Sort.Direction> sortMap = new HashMap<>();
sortMap.put("_id",Direction.ASC);
reader.setSort(sortMap);
reader.setTargetType(Order.class);
reader.setQuery("{batchProcessed:{$ne:true}}");
return reader;
}
#Bean
public Step uploadOrdersStep(OrderItemProcessor processor) {
return stepBuilderFactory.get("step1").<Order, Order>chunk(1)
.reader(orderReader()).processor(processor).writer(orderWriter).build();
}
#Bean
public Job orderUploadBatchJob(JobBuilderFactory factory, OrderItemProcessor processor) {
return factory.get("uploadOrder").flow(uploadOrdersStep(processor)).end().build();
}
The MongoItemReader is a paging item reader. When reading items in pages and changing items that might be returned by the query (ie a field that is used in the query's "where" clause), the paging logic can be lost and some items might be skipped. There's a similar problem with the JPA paging item reader that is explained in details here: Spring batch jpaPagingItemReader why some rows are not read?
Common techniques to work around this issue is to use a cursor-based reader, use a staging table/collection, use a partitioned step with a partition per page, etc.
i am using spring batch to read a flat file. The file has related records. ie, there can be a parent record and any number of child records i wanted to read all the records and call web service to store it. I also wanted to capture the relationship and store it.one challenge is child record can be anywhere in the file. And child can also have many children records.I am unable to find the solution for this problem with spring batch.
please provide your suggestions
update: I dont have any option to use database as temporary storage of data.
I had solved such problem by processing the file multiple times.
On every pass I'll try to read\process every record in the file with such alg:
if record has parent - check if parent already stored. If no - I skip it in processor
if record unchanged (or already stored if updates are not possible) - skip it in processor
else - store in db
And then declare loop and decider:
<batch:step id="processParentAndChilds" next="loop">
<batch:tasklet>
<batch:chunk reader="processParentAndChildsReader"
commit-interval="1000">
<batch:processor>
<bean class="processParentAndChildsProcessor"/>
</batch:processor>
<batch:writer>
<bean class="processParentAndChildsWriter"/>
</batch:writer>
</batch:chunk>
</batch:tasklet>
</batch:step>
<batch:decision decider="processParentAndChildsRetryDecider" id="loop">
<batch:next on="NEXT_LEVEL" to="processprocessParentAndChilds"/>
<batch:next on="COMPLETED" to="goToNextSteps"/>
</batch:decision>
public class ProcessParentAndChildsRetryDecider implements JobExecutionDecider{
#Override
public FlowExecutionStatus decide(JobExecution jobExecution, StepExecution stepExecution) {
// if no on record written - no sense to try again
if (stepExecution.getWriteCount() > 0) {
return new FlowExecutionStatus("NEXT_LEVEL");
} else {
return FlowExecutionStatus.COMPLETED;
}
}
}
My batched statements in mybatis are timing out. I'd like to throttle the load I'm sending to the database by flushing the statements periodically. In iBATIS, I used a callback, something like this:
sqlMapClientTemplate.execute(new SqlMapClientCallback<Integer>() {
#Override
public Integer doInSqlMapClient(SqlMapExecutor executor)
throws SQLException {
executor.startBatch();
int tally = 0;
for (Foo foo: foos) {
executor.insert("FooSql.insertFoo",foo.getData());
/* executes batch when > MAX_TALLY */
tally = BatchHelper.updateTallyOnMod(executor, tally);
}
return executor.executeBatch();
}
});
Is there a better way to do this in mybatis? Or do I need to do the same type of thing with SqlSessionCallback? This feels cumbersome. What I'd really like to do is configure the project to flush every N batched statements.
I did not get any responses, so I'll share the solution I settled on.
Mybatis provides direct access to statement flushing. I autowired the SqlSession, used Guava to partition the collection into manageable chunks, then flushed the statements after each chunk.
Iterable<List<Foo>> partitions = Iterables.partition(foos, MAX_TALLY);
for (List<Foo> partition : partitions) {
for (Foo foo : partition) {
mapper.insertFoo(foo);
}
sqlSession.flushStatements();
}
Sorry for the late response, but I just stumbled onto this question right now. However, hopefully it will help others with a similar problem.
You don't need to explicitly Autowire the SqlSession. You may use the mapper interface itself. In the mapper interface simply define a method that is annotated with the #Flush annotation and has a return type of List<BatchResult>. Here's an example of a method in the mapper interface:
#Flush
List<BatchResult> flushBatchedStatements();
Then simply call this method on your mapper object like so:
Iterable<List<Foo>> partitions = Iterables.partition(foos, MAX_TALLY);
for (List<Foo> partition : partitions)
{
for (Foo foo : partition)
{
mapper.insertFoo(foo);
}
mapper.flushBatchedStatements(); //this call will flush the all statements batched upto this point, into the table.
}
Note that you don't need to add anything special into your mapper XML file to support this type of statement flushing via the mapper interface. Your XML mapper may simply be something like
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace=".....">
<insert id="bulkInsertIntoTable" parameterType="myPackage.Foo">
insert into MyDatabaseTable(col1, col2, col3)
values
( #{fooObj.data1}, #{fooObj.data2}, #{fooObj.data3} )
</insert>
</mapper>
The only thing that is needed is that you use MyBatis 3.3 or higher. Here's what the MyBatis docs on the MyBatis website state:
If this annotation is used, it can be called the
SqlSession#flushStatements() via method defined at a Mapper
interface.(MyBatis 3.3 or above)
For more details please visit the MyBatis official documentation site:
http://www.mybatis.org/mybatis-3/java-api.html