How to make Execution context in Spring batch Partitioner to run in sequence - spring

I have a requirement where first I have to select no of MasterRecords from table and then for each MasterRecords I will have to fetch no of child rows and for each child rows process and write chunk wise.
To do this I used Partitioner in spring batch and created master and slave steps to achieve this. Now code is working fine if I dont need to run slave step in same sequence it was added to Execution context.
But my requirement is to run slave step for each execution context in same sequence it was added in partitioner. Because until I process parent record I cannot process child records.
Using partitioner slave step is not running in same sequence. Please help me how to maintain same sequence for slave step run ?????
Is there any other way to achieve this using spring batch. any help is welcomed.
<job id="EPICSDBJob" xmlns="http://www.springframework.org/schema/batch">
<!-- Create Order Master Start -->
<step id="populateNewOrdersMasterStep" allow-start-if-complete="false"
next="populateLineItemMasterStep">
<partition step="populateNewOrders" partitioner="pdcReadPartitioner">
<handler grid-size="1" task-executor="taskExecutor" />
</partition>
<batch:listeners>
<batch:listener ref="partitionerStepListner" />
</batch:listeners>
</step>
<!-- Create Order Master End -->
<listeners>
<listener ref="epicsPimsJobListner" />
</listeners>
</job>
<step id="populateNewOrders" xmlns="http://www.springframework.org/schema/batch">
<tasklet allow-start-if-complete="true">
<chunk reader="epicsDBReader" processor="epicsPimsProcessor"
writer="pimsWriter" commit-interval="10">
</chunk>
</tasklet>
<batch:listeners>
<batch:listener ref="stepJobListner" />
</batch:listeners>
</step>
<bean id="epicsDBReader" class="com.cat.epics.sf.batch.reader.EPICSDBReader" scope="step" >
<property name="sfObjName" value="#{stepExecutionContext[sfParentObjNm]}" />
<property name="readChunkCount" value="10" />
<property name="readerDao" ref="readerDao" />
<property name="configDao" ref="configDao" />
<property name="dBReaderService" ref="dBReaderService" />
</bean>
Partitioner Method:
#Override
public Map<String, ExecutionContext> partition(int arg0) {
Map<String, ExecutionContext> result = new LinkedHashMap<String, ExecutionContext>();
List<String> sfMappingObjectNames = configDao.getSFMappingObjNames();
int i=1;
for(String sfMappingObjectName: sfMappingObjectNames){
ExecutionContext value = new ExecutionContext();
value.putString("sfParentObjNm", sfMappingObjectName);
result.put("partition:"+i, value);
i++;
}
return result;
}

There isn't a way to guarantee order within Spring Batch's partitioning model. The fact that the partitions are executed in parallel means that, by definition, there will be no ordering to the records processed. I think this is a case where restructuring the job a bit may help.
If your requirement is to execute the parent then execute the children, using a driving query pattern along with the partitioning would work. You'd partition along the parent records (which it looks like you're doing), then in the worker step, you'd use the parent record to drive queries and processing for the children records. That would guarantee that the child records are processed after the master one.

Related

Get jobExecutionContext in xml config spring batch from before step

I am defining my MultiResourceItemReader on this way:
<bean id="multiDataItemReader" class="org.springframework.batch.item.file.MultiResourceItemReader" scope="step">
<property name="resources" value="#{jobExecutionContext['filesResource']}"/>
<property name="delegate" ref="dataItemReader"/>
</bean>
How you can see I want read from the jobExecutionContext the "filesResource" value.
Note: I changed some names to keep the "code privacy". This is executing, Is somebody wants more info please tell me.
I am saving this value in my first step and I am using the reader in the second step, Should I have access to it?
I am saving it in the final lines from my step1 tasklet:
ExecutionContext jobContext = context.getStepContext().getStepExecution().getJobExecution().getExecutionContext();
jobContext.put("filesResource", resourceString);
<batch:job id="myJob">
<batch:step id="step1" next="step2">
<batch:tasklet ref="moveFilesFromTasklet" />
</batch:step>
<batch:step id="step2">
<tasklet>
<chunk commit-interval="500"
reader="multiDataItemReader"
processor="dataItemProcessor"
writer="dataItemWriter" />
</tasklet>
</batch:step>
</batch:job>
I am not really sure what I am forgetting to get the value. The error that I am getting is:
20190714 19:49:08.120 WARN org.springframework.batch.item.file.MultiResourceItemReader [[ # ]] - No resources to read. Set strict=true if this should be an error condition.
I see nothing wrong with your config. The value of resourceString should be an array of org.springframework.core.io.Resource as this is the parameter type of the resources attribute of MultiResourceItemReader.
You can pass an array or a list of String with the absolute path to each resource and it should work. Here is a quick example:
class MyTasklet implements Tasklet {
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
List<String> resources = Arrays.asList(
"/full/path/to/resource1",
"/full/path/to/resource2");
chunkContext.getStepContext().getStepExecution().getJobExecution().getExecutionContext()
.put("filesResource", resources);
return RepeatStatus.FINISHED;
}
}

Spring Batch tasklet configuration

My batch Job takes input of 15 different files and the first step in my job is a Validate Step which will validate the header and total no of records, next step is the chunk based tasklet.
Now in validation, I have different if-else cases in my tasklet to handle different validations for the files and the cases are based on the file name and extension which are been passed as Job parameters. In this case I need to know whether the tasklet reference for my validation step can be decided upon runtime. i.e I'll have bean references for validators for 15 files and based upon the file extension in the job parameters my tasklet ref should be taken at runtime.
<batch:job id="downloadFile">
<batch:step id="validator" >
<batch:tasklet ref="InteropValidator"/>
<batch:next on="FAILED" to="AckStep"/>
<batch:next on="COMPLETED" to="loadFiles"/>
</batch:step>
<batch:step id="loadFiles" next="AckStep">
<partition step="slave" partitioner="partitionerFactory">
<handler grid-size="1" task-executor="taskExecutor" />
</partition>
</batch:step>
<batch:step id="AckStep" >
<batch:tasklet ref="Acksteptasklet"/>
<batch:fail on="FAILED"/>
</batch:step>
</batch:job>
In InteropValidator java, I have implemented the tasklet interface and written code snippet as below:
if("ICTX".equals(FilenameUtils.getExtension(filename)))
{
if(fileValidated && detailValid && agencyValid )
{
cc.getStepContext().getStepExecution().getJobExecution().getExecutionContext().put("STATUSCODE","01");
sc.setExitStatus(ExitStatus.COMPLETED);
}
}
if("SML".equals(FilenameUtils.getExtension(filename)))
{
//Validations for SML File
}

Spring JdbcCursorItemReader in spring batch application

My use case is as follows:
There is an Employee table and columns are as follows:
employee_id;
empoyee_dob;
employee_lastName;
employee_firstName;
employee_zipCode
Now there is an use-case to build a list of Employees present in Dept 'A' and zipcode 11223 and also employees present in Dept B and zipcode 33445.
I have configured a spring job as follows:
<batch:job id="EmployeeDetailsJob" job-repository="EmpDaoRepository">
<batch:step id="loadEmployeeDetails" >
<batch:tasklet transaction-manager="EmployeeDAOTranManager">
<batch:chunk reader="EmpDaoJdbcCursorItemReader" writer="EmpDaoWriter" commit-interval="200" skip-limit="100">
<batch:skippable-exception-classes>
</batch:skippable-exception-classes>
</batch:chunk>
<batch:listeners>
<batch:listener ref="EmpDaoStepListener" />
</batch:listeners>
<batch:transaction-attributes isolation="DEFAULT" propagation="REQUIRED" timeout="300" />
</batch:tasklet>
</batch:step>
</batch:job>
The configuration of reader is as follows:
<bean id="EmpDaoJdbcCursorItemReader" class="EmpDaoJdbcCursorItemReader">
<property name="dataSource" ref="EmpDataSource" />
<property name="sql">
<value><![CDATA[select * from Employee where employee_id=? and employee_zipCode=? ]]>
</value>
</property>
<property name="fetchSize" value="100"></property>
<property name="rowMapper" ref="EmployeeMapper" />
</bean>
There is class EmployeeQueryCriteria which has two fields employee_id and employee_zipCode.
In on of the steps i will create an ArrayList of EmployeeQueryCriteria objects for which the data has to be fetched.
So my question is:
1.Is there a way i can pass this List to the EmpDaoJdbcCursorItemReader and it will iterate through the object and set the parameter values from the EmployeeQueryCriteria object
2.Can i loop through the step to read data for every item in the ArrayList created containing EmployeeQueryCriteria and fetch the data.
The class EmpDaoJdbcCursorIte‌​mReader:
public class EmpDaoJdbcCursorIte‌​mReader extends JdbcCursorItemReader{
#BeforeStep
public void beforeStep(StepExecution stepExecution)
{
StringBuffer sqlQuerySB= new StringBuffer(super.getSql());
sqlQuerySB.append((").append(/*I am adding a comma seperated list of employee ids*/).append(")");
super.setSql(sqlQuerySB.toString());
}
}
My Spring configurations are as follows:
Spring-batch-core 2.2.2
Spring-beans 3.2.3
Spring-context 3.2.3
Can someone please provide suggestions on how to solve this problem.
you can iterate through the steps by following code model
<decision id="testLoop" decider="iterationDecider">
<next on="CONTINUABLE" to="pqrStep" />
<end on="FINISHED" />
</decision>
<step id="pqrStep" next="xyzStep">
<tasklet ref="someTasklet" />
</step>
<step id="xyzStep" next="testLoop">
<tasklet ref="someOtherTasklet" />
</step>
and Configuration is
<bean id="iterationDecider" class="com.xyz.StepFlowController" />
Following class will handle the flow based on the condition
public class StepFlowController implements JobExecutionDecider{
#Override
public FlowExecutionStatus decide(JobExecution jobExecution, StepExecution stepExecution) {
FlowExecutionStatus status = null;
try {
if (conditionIsTrue) {
status = new FlowExecutionStatus("CONTINUABLE");
}else {
status = new FlowExecutionStatus("FINISHED");
}
} catch (Exception e) {
e.printStackTrace();
}
return status;
}

Processing huge data with spring batch partitioning

I am implementing spring batch job for processing millions of records in a DB table using partition approach as follows -
Fetch a unique partitioning codes from table in a partitioner and set the same in execution context.
Create a chunk step with reader,processor and writer to process records based on particular partition code.
Is this approach is proper or is there any better approach for situation like this? As some partition codes can have more number of records than others,so those with more records might take more time to process than the ones with less records.
Is it possible to create partition/thread to process like thread1 process 1-1000,thread2 process 1001-2000 etc ?
How do I control number of threads getting created as partition codes can be around 100, I would like to create only 20 thread and process in 5 iteration?
What happens if one partition fails, will all processing stop and reverted back?
Following are configurations -
<bean id="MyPartitioner" class="com.MyPartitioner" />
<bean id="itemProcessor" class="com.MyProcessor" scope="step" />
<bean id="itemReader" class="org.springframework.batch.item.database.JdbcCursorItemReader" scope="step" >
<property name="dataSource" ref="dataSource"/>
<property name="sql" value="select * from mytable WHERE code = '#{stepExecutionContext[code]}' "/>
<property name="rowMapper">
<bean class="com.MyRowMapper" scope="step"/>
</property>
</bean>
<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor" >
<property name="corePoolSize" value="20"/>
<property name="maxPoolSize" value="20"/>
<property name="allowCoreThreadTimeOut" value="true"/>
</bean>
<batch:step id="Step1" xmlns="http://www.springframework.org/schema/batch">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="itemReader" processor="itemProcessor" writer="itemWriter" commit-interval="200"/>
</batch:tasklet>
</batch:step>
<batch:job id="myjob">
<batch:step id="mystep">
<batch:partition step="Step1" partitioner="MyPartitioner">
<batch:handler grid-size="20" task-executor="taskExecutor"/>
</batch:partition>
</batch:step>
</batch:job>
Partitioner -
public class MyPartitioner implements Partitioner{
#Override
public Map<String, ExecutionContext> partition(int gridSize)
{
Map<String, ExecutionContext> partitionMap = new HashMap<String, ExecutionContext>();
List<String> codes = getCodes();
for (String code : codes)
{
ExecutionContext context = new ExecutionContext();
context.put("code", code);
partitionMap.put(code, context);
}
return partitionMap;}}
Thanks
I would say it is right approach, I do not see why you need to have one thread per 1000 items, if you partition per unique partitioning codes and have chunk of 1000 items you will have transactions on 1000 items per thread which is IMO ok.
In addition to saving unique partitioning codes you can count how
many you have of each partition code and partition even more, by
creating new subcontext for every 1000 of same partition code (that
way for partition code that has i.e. 2200 records you will invoke 3
threads with context params: 1=> partition_key=key1, skip=0,
count=1000, 2=>partition_key=key1, skip=1000, count=1000 and
3=>partition_key=key1, skip=2000, count=1000) if that is what you
want but I would still go without it
Number of threads is controlled with ThreadPoolTaskExecutor which is passed to partition step when you create it. You have method setCorePoolSize() which you can set on 20 and you will get at most 20 threads. Next fine grain configuration is grid-size which tells how many partitions will be created out of full partition map. Here is explanation of grid size. So partitioning is about dividing the work. After that, your threading configuration will define the concurrency of the actual processing.
If one partition fails whole partitioned step fails with information which partition failed. Success partitions are done and would not be invoked again and when job restarts it will pick up where it left off by redoing failed and non processed partitions.
Hope I picked up all questions you had since there were many.
Example of case 1 - maybe there are mistakes but just to get idea:
public class MyPartitioner implements Partitioner{
#Override
public Map<String, ExecutionContext> partition(int gridSize)
{
Map<String, ExecutionContext> partitionMap = new HashMap<String, ExecutionContext>();
Map<String, int> codesWithCounts = getCodesWithCounts();
for (Entry<String, int> codeWithCount : codesWithCounts.entrySet())
{
for (int i = 0; i < codeWithCount.getValue(); i + 1000){
ExecutionContext context = new ExecutionContext();
context.put("code", code);
context.put("skip", i);
context.put("count", 1000);
partitionMap.put(code, context);
}
}
return partitionMap;
}
Adn than you page by 1000 and you get from context how many you should skip which will in example of 2200 be: 0, 1000, 2000.

how to control repeat and next step execution in spring batch

Hi I have below like xml for executing Job
<batch:job id="Job1" restartable="false" xmlns="http://www.springframework.org/schema/batch">
<step id="step1" next="step2">
<tasklet ref="automate" />
</step>
<step id="step2">
<tasklet ref="drive" />
<next on="COMPLETED" to="step3"></next>
</step>
<step id="step3">
<tasklet ref="generate_file" />
</step>
</batch:job>
For this I have write a tasklet to execute a script. Now I want that if script execution failed three times then next step will not execute . But from Tasklet I am able to return only Finished which move the flow to next step and continuable which continue the process. What should I do in this.
you can write your own decider to decide wheather to goto next step or to end the job.
if you are able to handle the failures you can also handle the flow of a job
<decision id="validationDecision" decider="validationDecider">
<next on="FAILED" to="abcStep" />
<next on="COMPLETE" to="xyzstep" />
</decision>
config is
<bean id="validationDecider" class="com.xyz.StepFlowController" />
class is
public class StepFlowController implements JobExecutionDecider{
#Override
public FlowExecutionStatus decide(JobExecution jobExecution, StepExecution stepExecution) {
FlowExecutionStatus status = null;
try {
if (failure) {
status = new FlowExecutionStatus("FAILED");
}else {
status = new FlowExecutionStatus("COMPLETE");
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return status;
}
This can be achieved by specifying a custom "chunk-completion-policy" in that step and count the number of failures. Take a look at "Stopping a Job Manually for Business Reasons" and this example for custom chunk completion policy. Hope this helps.
EDIT: you can put the number of failures in step execution context like in your step logic and then retrieve it in your completion policy class:
stepExecution.getJobExecution().getExecutionContext().put("ERROR_COUNT", noOfErrors);

Resources