SPRING BATCH Execution with 2 modes - spring

I have a requirement to execute a job with 2 modes using a parameter to distinguich betwwen the 2 modes .for example if the user use the parameter X in this case the job must read the data from the database and export it (all the records ) to an xml file.Otherwise if the user uses the parameter Y in this case the job must write each record in a separate xml file using the same header.

Use jobParameter to distinguish the modes:
#StepScope
#Bean
public Tasklet task(#Value("#{jobParameters['mode']}") String mode) {
}
If the modes are so different then you can't do them in same step, use decider:
FlowBuilder<Flow> flowBuilder = new FlowBuilder<>("modesFlow");
Flow flow = flowBuilder
.start(modesDecider)
.on("X")
.to(step1)
.from(modesDecider)
.on(step2)
.end()
.build();
jobBuilderFactory.get("modesJob")
.incrementer(new RunIdIncrementer())
.start(flow)
.end()
.build();
where:
ModesDecider implements JobExecutionDecider

Related

Dynamic name for a spring batch job

I defined a spring batch job very simple like below. I want to change its registered name using a parameter received (which is added to the spring batch parameter list of the job as jobName)
#Bean
#JobScope //this doesn't work throws exception 'No context holder available for job scope'
Job genericJob (JobNotifierListener listener,
Step genericStep1, Step genericStep2,
#Value("#{jobParameters['jobName']}") String jobName
) {
return jobBuilderFactory.get(jobName + "GenericJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(genericStep1)
.next(genericStep2)
.build();
}
How can I configure the job so that the name of the job is dynamically changed using the input batch parameter jobName? (as adding #JobScope to access the spring batch context doesn't work, throws error)
The job name should not be a job parameter. Job parameters are designed for "business" runtime parameters, not technical configuration parameters. An application property or a system property is better suited for your case:
#Bean
//#JobScope // no need for this
Job genericJob (JobNotifierListener listener,
Step genericStep1, Step genericStep2,
#Value("#{systemProperties['jobName']}") String jobName
) {
return jobBuilderFactory.get(jobName + "GenericJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(genericStep1)
.next(genericStep2)
.build();
}

Spring Batch Conditional Flow - The second flow always goes into status FAILED

I have created a Spring Batch app and I'm struggling to implement a simple flow with a condition. Here's what I want to implement:
I tried to achieve this implementing the following code:
#Bean
public Job job(JobCompletionNotificationListener listener) {
return jobs.get(Constants.JOB_SIARD_FILES_PROCESSOR + new Date().getTime())
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1())
.next(decider()).on("yes").to(step2345Flow())
.end()
.build();
}
#Bean
public Flow step2345Flow() {
return new FlowBuilder<SimpleFlow>("yes_flow")
.start(step2())
.next(step3())
.next(step4())
.next(step5())
.build();
}
When the condition is "yes" the flow is working just fine, but when the condition is "no" the flow always ends with an execution status "FAILED". I want it to be "COMPLETED" just like the first flow but without executing the steps 2, 3, 4 and 5.
Hope anyone can help me with this.
Spring Batch does not allow alternative branches in the flow to be implicit. In other words, you need an on(...) for each case.
Assuming decider() yields a proxied bean, it should work fine with
#Bean
public Job job(JobCompletionNotificationListener listener) {
return jobs.get(Constants.JOB_SIARD_FILES_PROCESSOR + new Date().getTime())
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1())
.next(decider()).on("yes").to(step2345Flow())
.from(decider()).on("no").end()
.end()
.build();
}
To cover really all cases, you can also use on("*") instead of on("no").
Please also have a second look at the official documentation: https://docs.spring.io/spring-batch/docs/4.3.x/reference/html/index-single.html#controllingStepFlow

spring batch restart counter

i have been playing with a spring batch job that reads a sample csv file and dumps the records into a table.
My question is surrounding restarts, i have introduced a data issue in the file ( too long to insert) in the 3rd line
In the first run
The first two lines get inserted and the third line fails ( as expected )
when i restart
The fourth line is picked up and the rest of the file is processed
All the documentation seems to suggest that spring batch picks up where it left off, does it mean the 3rd ( problem record ) considered
'attempted' and hence wont be tried again? i was expecting all the restarts to fail untill i fixed the file.
#Bean
public FlatFileItemReader<Person> reader() {
return new FlatFileItemReaderBuilder<Person>()
.name("personItemReader")
.resource(new ClassPathResource("sample-data.csv"))
.delimited()
.names(new String[]{"firstName", "lastName"})
.fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
setTargetType(Person.class);
}})
.build();
}
#Bean
public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
.dataSource(dataSource)
.build();
}
#Bean
public Step step1(JdbcBatchItemWriter<Person> writer) {
return stepBuilderFactory.get("step1")
.<Person, Person> chunk(1)
.reader(reader())
.processor(processor())
.writer(writer)
.taskExecutor(taskExecutor())
.throttleLimit(1)
.build();
}
#Bean
public Job importUserJob(JobCompletionNotificationListener listener) {
return jobBuilderFactory.get("importUserJob")
.incrementer(new RunIdIncrementer())
.listener(listener)
.start(step1)
.build();
}
Please let me know have you gone through below. If Its not clear I can share the same sample project in github
Spring Batch restart uncompleted jobs from the same execution and step
Spring Batch correctly restart uncompleted jobs in clustered environment
In production we always use "fault-tolerant" so that job will reject the wrong data and continue. Later operations will correct the data and re-execute the job again. Advantage here is huge volume of data can be continuously processed and no need to wait for data correction.
Please compare your code with below
https://github.com/ngecom/stackoverflow-springbatchRestart
You have set a RunIdIncrementer on your job, so you will have a new job instance on each run. You need to remove that incrementer and pass the file as a job parameter to have the same job instance on each run. With this approach, all restarts will fail until you fix the file.
As a side note, you can't have restartability if you use a multi-threaded step. This is because the state would not be consistent when using multiple threads. So you need to use a single threaded-step (remove the task executor). This is explained in the documentation here: Multi-threaded step.

Spring batch flow declaration using java config

I am reading the spring batch documentation and stuck on following part:
There are provided following example:
#Bean
public Job job() {
Flow flow1 = new FlowBuilder<SimpleFlow>("flow1")
.start(step1())
.next(step2())
.build();
Flow flow2 = new FlowBuilder<SimpleFlow>("flow2")
.start(step3())
.build();
return this.jobBuilderFactory.get("job")
.start(flow1)
.split(new SimpleAsyncTaskExecutor())
.add(flow2)
.next(step4())
.end()
.build();
}
But it is not explained what is happening.
as far I understand flow1 and flow2 are executed in parallel but what about step4 ?
step4() is executed linearly after flow1 and flow2 returned.
Look at the FlowBuilder.SplitBuilder.add() javadoc :
public FlowBuilder<Q> add(Flow... flows)
Add flows to the split, in addition to the current state already
present in the parent builder.
Parameters:
flows - more flows to add to the split
Returns: the parent builder
It returns the parent builder and not the current SplitBuilder object.
So it is not included in the flow split and so is executed sequentially.
To run the 3 flows in parallel :
return this.jobBuilderFactory.get("job")
.start(flow1)
.split(new SimpleAsyncTaskExecutor())
.add(flow2, step4())
.end()
.build();

Spring batch repeat step ending up in never ending loop

I have a spring batch job that I'd like to do the following...
Step 1 -
Tasklet - Create a list of dates, store the list of dates in the job execution context.
Step 2 -
JDBC Item Reader - Get list of dates from job execution context.
Get element(0) in dates list. Use is as input for jdbc query.
Store element(0) date is job execution context
Remove element(0) date from list of dates
Store element(0) date in job execution context
Flat File Item Writer - Get element(0) date from job execution context and use for file name.
Then using a job listener repeat step 2 until no remaining dates in the list of dates.
I've created the job and it works okay for the first execution of step 2. But step 2 is not repeating as I want it to. I know this because when I debug through my code it only breaks for the initial run of step 2.
It does however continue to give me messages like below as if it is running step 2 even when I know it is not.
2016-08-10 22:20:57.842 INFO 11784 --- [ main] o.s.batch.core.job.SimpleStepHandler : Duplicate step [readStgDbAndExportMasterListStep] detected in execution of job=[exportMasterListCsv]. If either step fails, both will be executed again on restart.
2016-08-10 22:20:57.846 INFO 11784 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [readStgDbAndExportMasterListStep]
This ends up in a never ending loop.
Could someone help me figure out or give a suggestion as to why my stpe 2 is only running once?
thanks in advance
I've added two links to PasteBin for my code so as not to pollute this post.
http://pastebin.com/QhExNikm (Job Config)
http://pastebin.com/sscKKWRk (Common Job Config)
http://pastebin.com/Nn74zTpS (Step execution listener)
From your question and your code I deduct that based on the amount of dates that you retrieve (this happens before the actual job starts), you will execute a step for the amount of times you have dates.
I suggest a design change. Create a java class that will get you the dates as a list and based on that list you will dynamically create your steps. Something like this:
#EnableBatchProcessing
public class JobConfig {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private JobDatesCreator jobDatesCreator;
#Bean
public Job executeMyJob() {
List<Step> steps = new ArrayList<Step>();
for (String date : jobDatesCreator.getDates()) {
steps.add(createStep(date));
}
return jobBuilderFactory.get("executeMyJob")
.start(createParallelFlow(steps))
.end()
.build();
}
private Step createStep(String date){
return stepBuilderFactory.get("readStgDbAndExportMasterListStep" + date)
.chunk(your_chunksize)
.reader(your_reader)
.processor(your_processor)
.writer(your_writer)
.build();
}
private Flow createParallelFlow(List<Step> steps) {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
// max multithreading = -1, no multithreading = 1, smart size = steps.size()
taskExecutor.setConcurrencyLimit(1);
List<Flow> flows = steps.stream()
.map(step -> new FlowBuilder<Flow>("flow_" + step.getName()).start(step).build())
.collect(Collectors.toList());
return new FlowBuilder<SimpleFlow>("parallelStepsFlow")
.split(taskExecutor)
.add(flows.toArray(new Flow[flows.size()]))
.build();
}
}
EDIT: added "jobParameter" input (slightly different approach also)
Somewhere on your classpath add the following example .properties file:
sql.statement="select * from awesome"
and add the following annotation to your JobDatesCreator class
#PropertySource("classpath:example.properties")
You can provide specific sql statements as a command line argument as well. From the spring documentation:
you can launch with a specific command line switch (e.g. java -jar
app.jar --name="Spring").
For more info on that see http://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html
The class that gets your dates (why use a tasklet for this?):
#PropertySource("classpath:example.properties")
public class JobDatesCreator {
#Value("${sql.statement}")
private String sqlStatement;
#Autowired
private CommonExportFromStagingDbJobConfig commonJobConfig;
private List<String> dates;
#PostConstruct
private void init(){
// Execute your logic here for getting the data you need.
JdbcTemplate jdbcTemplate = new JdbcTemplate(commonJobConfig.onlineStagingDb);
// acces to your sql statement provided in a property file or as a command line argument
System.out.println("This is the sql statement I provided in my external property: " + sqlStatement);
// for now..
dates = new ArrayList<>();
dates.add("date 1");
dates.add("date 2");
}
public List<String> getDates() {
return dates;
}
public void setDates(List<String> dates) {
this.dates = dates;
}
}
I also noticed that you have alot of duplicate code that you can quite easily refactor. Now for each writer you have something like this:
#Bean
public FlatFileItemWriter<MasterList> division10MasterListFileWriter() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(outDir, MerchHierarchyConstants.DIVISION_NO_10 )));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
Consider using something like this instead:
public FlatFileItemWriter<MasterList> divisionMasterListFileWriter(String divisionNumber) {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(outDir, divisionNumber )));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
As not all code is available to correctly replicate your issue, this answer is a suggestion/indication to solve your problem.
Based on our discussion on Spring batch execute dynamically generated steps in a tasklet I'm trying to answer the questions on how to access jobParameter before the job is actually being executed.
I assume that there is restcall which will execute the batch. In general, this will require the following steps to be taken.
1. a piece of code that receives the rest call with its parameters
2. creation of a new springcontext (there are ways to reuse an existing context and launch the job again but there are some issues when it comes to reuse of steps, readers and writers)
3. launch the job
The simplest solution would be to store the jobparameter received from the service as an system-property and then access this property when you build up the job in step 3. But this could lead to a problem if more than one user starts the job at the same moment.
There are other ways to pass parameters into the springcontext, when it is loaded. But that depends on the way you setup your context.
For instance, if you are using SpringBoot directly for step 2, you could write a method like:
private int startJob(Properties jobParamsAsProps) {
SpringApplication springApp = new SpringApplication(.. my config classes ..);
springApp.setDefaultProperties(jobParamsAsProps);
ConfigurableApplicationContext context = springApp.run();
ExitCodeGenerator exitCodeGen = context.getBean(ExitCodeGenerator.class);
int code = exitCodeGen.getExitCode();
context.close();
return cod;
}
This way, you could access the properties as normal with standard Value- or ConfigurationProperties Annotations.

Resources