Can't get spring batch conditional flows working - spring

I'm having trouble getting a conditional spring batch flow to work using java config. The samples I've seen in spring batch samples, or spring batch's test code, or on stack overflow tend to show a conditional where a single step needs to be executed on condition, or it's the final step, or both. That's not the case I need to solve.
In procedural pseudo code, I want it to behave like
initStep()
if decision1()
subflow1()
middleStep()
if decision2()
subflow2()
lastStep()
So, subflow1 and 2 are conditional, but init, middle and last always execute. Here's my stripped down test case. In the current configuration, it just quits after executing subflow1.
public class FlowJobTest {
private JobBuilderFactory jobBuilderFactory;
private JobRepository jobRepository;
private JobExecution execution;
#BeforeMethod
public void setUp() throws Exception {
jobRepository = new MapJobRepositoryFactoryBean().getObject();
jobBuilderFactory = new JobBuilderFactory(jobRepository);
execution = jobRepository.createJobExecution("flow", new JobParameters());
}
#Test
public void figureOutFlowJobs() throws Exception {
JobExecutionDecider subflow1Decider = decider(true);
JobExecutionDecider subflow2Decider = decider(false);
Flow subflow1 = new FlowBuilder<Flow>("subflow-1").start(echo("subflow-1-Step-1")).next(echo("subflow-1-Step-2")).end();
Flow subflow2 = new FlowBuilder<Flow>("subflow-2").start(echo("subflow-2-Step-1")).next(echo("subflow-2-Step-2")).end();
Job job = jobBuilderFactory.get("testJob")
.start(echo("init"))
.next(subflow1Decider)
.on("YES").to(subflow1)
.from(subflow1Decider)
.on("*").to(echo("middle"))
.next(subflow2Decider)
.on("YES").to(subflow2)
.from(subflow2Decider)
.on("*").to(echo("last"))
.next(echo("last"))
.build().preventRestart().build();
job.execute(execution);
assertEquals(execution.getStatus(), BatchStatus.COMPLETED);
assertEquals(execution.getStepExecutions().size(), 5);
}
private Step echo(String stepName) {
return new AbstractStep() {
{
setName(stepName);
setJobRepository(jobRepository);
}
#Override
protected void doExecute(StepExecution stepExecution) throws Exception {
System.out.println("step: " + stepName);
stepExecution.upgradeStatus(BatchStatus.COMPLETED);
stepExecution.setExitStatus(ExitStatus.COMPLETED);
jobRepository.update(stepExecution);
}
};
}
private JobExecutionDecider decider(boolean decision) {
return (jobExecution, stepExecution) -> new FlowExecutionStatus(decision ? "YES" : "NO");
}
}

Your original job definition should work, as well, with only a small tweak. The test was failing because the job finished (with status COMPLETED) after the first sub flow. If you instruct it to continue to the middle step instead, it should work as intended. Similar adjustment for the second flow.
Job job = jobBuilderFactory.get("testJob")
.start(echo("init"))
.next(subflow1Decider)
.on("YES").to(subflow1).next(echo("middle"))
.from(subflow1Decider)
.on("*").to(echo("middle"))
.next(subflow2Decider)
.on("YES").to(subflow2).next(echo("last"))
.from(subflow2Decider)
.on("*").to(echo("last"))
.build().preventRestart().build();

The approach I used to make this work was to break my conditional flows into flow steps.
public void figureOutFlowJobsWithFlowStep(boolean decider1, boolean decider2, int expectedSteps) throws Exception {
JobExecutionDecider subflow1Decider = decider(decider1);
JobExecutionDecider subflow2Decider = decider(decider2);
Flow subFlow1 = new FlowBuilder<Flow>("sub-1")
.start(subflow1Decider)
.on("YES")
.to(echo("sub-1-1")).next(echo("sub-1-2"))
.from(subflow1Decider)
.on("*").end()
.end();
Flow subFlow2 = new FlowBuilder<Flow>("sub-2")
.start(subflow2Decider)
.on("YES").to(echo("sub-2-1")).next(echo("sub-2-2"))
.from(subflow2Decider)
.on("*").end()
.end();
Step subFlowStep1 = new StepBuilder("sub1step").flow(subFlow1).repository(jobRepository).build();
Step subFlowStep2 = new StepBuilder("sub2step").flow(subFlow2).repository(jobRepository).build();
Job job = jobBuilderFactory.get("testJob")
.start(echo("init"))
.next(subFlowStep1)
.next(echo("middle"))
.next(subFlowStep2)
.next(echo("last"))
.preventRestart().build();
job.execute(execution);
assertEquals(execution.getStatus(), BatchStatus.COMPLETED);
assertEquals(execution.getStepExecutions().size(), expectedSteps);
}

Related

Uses of JobExecutionDecider in Spring Batch split flow using SimpleAsyncTaskExecutor

I want to configure a Spring Batch job with 4 steps. Step-2 and Step-3 are independent to each other. So I want to execute then in parallel. Any of these 2 steps or both can be skipped depending on Execution Parameter. Check the flow as mentioned below :
Batch flow details
Java Configuration as mentioned below:
#Bean
public Job sampleBatchJob()
throws Exception {
final Flow step1Flow = new FlowBuilder<SimpleFlow>("step1Flow")
.from(step1Tasklet()).end();
final Flow step2Flow = new FlowBuilder<SimpleFlow>("step2Flow")
.from(new step2FlowDecider()).on("EXECUTE").to(step2MasterStep())
.from(new step2FlowDecider()).on("SKIP").end(ExitStatus.COMPLETED.getExitCode())
.build();
final Flow step3Flow = new FlowBuilder<SimpleFlow>("step3Flow")
.from(new step3FlowDecider()).on("EXECUTE").to(step3MasterStep())
.from(new step3FlowDecider()).on("SKIP").end(ExitStatus.COMPLETED.getExitCode())
.build();
final Flow splitFlow = new FlowBuilder<Flow>("splitFlow")
.split(new SimpleAsyncTaskExecutor())
.add(step2Flow, step3Flow)
.build();
return jobBuilderFactory().get("sampleBatchJob")
.start(step1Flow)
.next(splitFlow)
.next(step4MasterStep())
.end()
.build();
}
Sample code for Step2FlowDecider:
public class Step2FlowDecider
implements JobExecutionDecider {
#Override
public FlowExecutionStatus decide(JobExecution jobExecution, StepExecution stepExecution) {
if (StringUtils.equals("Y", batchParameter.executeStep2())) {
return new FlowExecutionStatus("EXECUTE");
}
return new FlowExecutionStatus("SKIP");
}
}
With this configuration, when I try to execute the batch, it is getting failed, without any details error log.

Spring Batch question for email summary at the end of all jobs

We have approximately 20 different Spring Batch jobs (some running as microservices, some lumped together in one Spring Boot app). What I need to do is gather all the errors encountered by ALL the jobs, as well as the number of records processed, and summarize it all in an email.
I have implemented ItemListenerSupport as a start:
public class BatchItemListener extends ItemListenerSupport<BaseDomainDataObject, BaseDomainDataObject> {
private final static Log logger = LogFactory.getLog(BatchItemListener.class);
private final static Map<String, Integer> numProcessedMap = new HashMap<>();
private final static Map<String, Integer> errorMap = new HashMap<>();
#Override
public void onReadError(Exception ex) {
logger.error("Encountered error on read", ex);
}
#Override
public void onProcessError(BaseDomainDataObject item, Exception ex) {
String msgBody = ExceptionUtils.getStackTrace(ex);
errorMap.put(item, msgBody);
}
#Override
public void onWriteError(Exception ex, List<? extends BaseDomainDataObject> items) {
logger.error("Encountered error on write", ex);
numProcessedMap.computeIfAbsent("numErrors", val -> items.size());
}
#Override
public void afterWrite(List<? extends BaseDomainDataObject> items) {
logger.info("Logging successful number of items written...");
numProcessedMap.computeIfAbsent("numSuccess", val -> items.size());
}
}
But how to I access the errors I accumulate in the listener when my batch jobs are finally finished? Right now I don't even have a good way to know when they are all finished. Any suggestions? Does Spring Batch provide something better for summarizing jobs?
Spring Batch does not provide a way to orchestrate jobs. The closest you can get out of the box is using a "master" job with multiple steps of type Jobstep that delegate to your sub-jobs. with this approach, you can do the aggregation in a JobExecutionListener#afterJob configured on the master job.
Otherwise, you can Spring Cloud Data Flow and create a composed task of all your jobs.

Spring batch stop a job

How can I stop a job in spring batch ? I tried to use this method using the code below:
public class jobListener implements JobExecutionListener{
#Override
public void beforeJob(JobExecution jobExecution) {
jobExecution.setExitStatus(ExitStatus.STOPPED);
}
#Override
public void afterJob(JobExecution jobExecution) {
// TODO Auto-generated method stub
}
}
I tried also COMPLETED,FAILED but this method doesn't work and the job continues to execute. Any solution?
You can use JobOperator along with JobExplorer to stop a job from outside the job (see https://docs.spring.io/spring-batch/reference/html/configureJob.html#JobOperator). The method is stop(long executionId) You would have to use JobExplorer to find the correct executionId to stop.
Also from within a job flow config you can configure a job to stop after a steps execution based on exit status (see https://docs.spring.io/spring-batch/trunk/reference/html/configureStep.html#stopElement).
I assume you want to stop a job by a given name.
Here is the code.
String jobName = jobExecution.getJobInstance().getJobName(); // in most cases
DataSource dataSource = ... //#Autowire it in your code
JobOperator jobOperator = ... //#Autowire it in your code
JobExplorerFactoryBean factory = new JobExplorerFactoryBean();
factory.setDataSource(dataSource);
factory.setJdbcOperations(new JdbcTemplate(dataSource));
JobExplorer jobExplorer = factory.getObject();
Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions(jobName);
jobExecutions.forEach(jobExecution -> jobOperator.stop(jobExecution.getId()));

Return job id "immediately" for spring batch job before it completes

I am working on a project where we are using Spring Boot, Spring Batch and Camel.
The batch process is started by a call to a rest endpoint. The rest controller starts a camel route that starts the spring batch job flow (via spring batch camel component).
I have no control over the external application that calls my application. My application is part of a bigger nightly work flow.
The batch job can take a long time to complete and therefore the external application periodically polls my batch job via another rest endpoint asking if the job is complete. It does this by polling a status rest endpoint with the id of the jobExecution it wants a status on.
To accomplish this flow I have implemented a rest controller that starts the camel route via a ProducerTemplate. My problem is returning the job execution id immediately after starting the camel route. I don't want the rest call to wait until the job is complete to return.
startJobViaRestCall ------> createBatchJob ----> runBatchJobUntilDone
|
|
Return jobExecutionData |
<----------------------------------
I have tried using async calls and futures, but with no luck. I have also tried to use Camels wiretap to no avail. The problem is that there is only "onComplete" events. I need an hook that returns as soon as the job has been created, but not run.
For example, the following code waits until the batch job is done before returning the JobExecution data I want to send back (as json). It makes sense as extractFutureBody will wait until the response is ready.
#RestController
#Slf4j
public class BatchJobController {
#Autowired
ProducerTemplate producerTemplate;
#RequestMapping(value = "/batch/job/start", method = RequestMethod.GET)
#ResponseBody
public String startBatchJob() {
log.info("BatchJob start called...");
String jobExecution = producerTemplate.extractFutureBody(producerTemplate.asyncRequestBody(BatchRoute.ENDPOINT_JOB_START, ""), String.class);
return jobExecution;
}
}
The camel route is a simple call to the spring-batch-component
public class BatchRoute<I, O> extends BaseRoute {
private static final String ROUTE_START_BATCH = "spring-batch:springBatchJob";
#Override
public void configure() {
super.configure();
from(ENDPOINT_JOB_START).to(ROUTE_START_BATCH);
}
}
Any ideas as to how I can return the JobExecution data as soon as it is available?
Not sure How you could do it in Camel, but here is sample Job execution using spring-rest.
#RestController
public class KpRest {
private static final Logger LOG = LoggerFactory.getLogger(KpRest.class);
private static String RUN_ID_KEY = "run.id";
#Autowired
private JobLauncher launcher;
private final AtomicLong incrementer = new AtomicLong();
#Autowired
private Job job;
#RequestMapping("/hello")
public String sayHello(){
try {
JobParameters parameters = new JobParametersBuilder().addLong(RUN_ID_KEY, incrementer.incrementAndGet()).toJobParameters();
JobExecution execution = launcher.run(job, parameters);
LOG.info("JobId {}, JobStatus {}", execution.getJobId(), execution.getStatus().getBatchStatus());
return String.valueOf(execution.getJobId());
} catch (JobExecutionAlreadyRunningException | JobRestartException | JobInstanceAlreadyCompleteException
| JobParametersInvalidException e) {
LOG.info("Job execution failed, {}", e);
}
return "Some Error";
}
}
You can make the Job async by modifying JobLauncher.
#Bean
public JobLauncher simpleJobLauncher(JobRepository jobRepository){
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository);
jobLauncher.setTaskExecutor(new SimpleAsyncTaskExecutor());
return jobLauncher;
}
Refer the documentation for more info

Spring batch : Assemble a job rather than configuring it (Extensible job configuration)

Background
I am working on designing a file reading layer that can read delimited files and load it in a List. I have decided to use Spring Batch because it provides a lot of scalability options which I can leverage for different sets of files depending on their size.
The requirement
I want to design a generic Job API that can be used to read any delimited file.
There should be a single Job structure that should be used for parsing every delimited file. For example, if the system needs to read 5 files, there will be 5 jobs (one for each file). The only way the 5 jobs will be different from each other is that they will use a different FieldSetMapper, column name, directory path and additional scaling parameters such as commit-interval and throttle-limit.
The user of this API should not need to configure a Spring
batch job, step, chunking, partitioning, etc on his own when a new file type is introduced in the system.
All that the user needs to do is to provide the FieldsetMapperto be used by the job along with the commit-interval, throttle-limit and the directory where each type of file will be placed.
There will be one predefined directory per file. Each directory can contain multiple files of the same type and format. A MultiResourcePartioner will be used to look inside a directory. The number of partitions = number of files in the directory.
My requirement is to build a Spring Batch infrastructure that gives me a unique job I can launch once I have the bits and pieces that will make up the job.
My solution :
I created an abstract configuration class that will be extended by concrete configuration classes (There will be 1 concrete class per file to be read).
#Configuration
#EnableBatchProcessing
public abstract class AbstractFileLoader<T> {
private static final String FILE_PATTERN = "*.dat";
#Autowired
JobBuilderFactory jobs;
#Autowired
ResourcePatternResolver resourcePatternResolver;
public final Job createJob(Step s1, JobExecutionListener listener) {
return jobs.get(this.getClass().getSimpleName())
.incrementer(new RunIdIncrementer()).listener(listener)
.start(s1).build();
}
public abstract Job loaderJob(Step s1, JobExecutionListener listener);
public abstract FieldSetMapper<T> getFieldSetMapper();
public abstract String getFilesPath();
public abstract String[] getColumnNames();
public abstract int getChunkSize();
public abstract int getThrottleLimit();
#Bean
#StepScope
#Value("#{stepExecutionContext['fileName']}")
public FlatFileItemReader<T> reader(String file) {
FlatFileItemReader<T> reader = new FlatFileItemReader<T>();
String path = file.substring(file.indexOf(":") + 1, file.length());
FileSystemResource resource = new FileSystemResource(path);
reader.setResource(resource);
DefaultLineMapper<T> lineMapper = new DefaultLineMapper<T>();
lineMapper.setFieldSetMapper(getFieldSetMapper());
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(",");
tokenizer.setNames(getColumnNames());
lineMapper.setLineTokenizer(tokenizer);
reader.setLineMapper(lineMapper);
reader.setLinesToSkip(1);
return reader;
}
#Bean
public ItemProcessor<T, T> processor() {
// TODO add transformations here
return null;
}
#Bean
#JobScope
public ListItemWriter<T> writer() {
ListItemWriter<T> writer = new ListItemWriter<T>();
return writer;
}
#Bean
#JobScope
public Step readStep(StepBuilderFactory stepBuilderFactory,
ItemReader<T> reader, ItemWriter<T> writer,
ItemProcessor<T, T> processor, TaskExecutor taskExecutor) {
final Step readerStep = stepBuilderFactory
.get(this.getClass().getSimpleName() + " ReadStep:slave")
.<T, T> chunk(getChunkSize()).reader(reader)
.processor(processor).writer(writer).taskExecutor(taskExecutor)
.throttleLimit(getThrottleLimit()).build();
final Step partitionedStep = stepBuilderFactory
.get(this.getClass().getSimpleName() + " ReadStep:master")
.partitioner(readerStep)
.partitioner(
this.getClass().getSimpleName() + " ReadStep:slave",
partitioner()).taskExecutor(taskExecutor).build();
return partitionedStep;
}
/*
* #Bean public TaskExecutor taskExecutor() { return new
* SimpleAsyncTaskExecutor(); }
*/
#Bean
#JobScope
public Partitioner partitioner() {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
Resource[] resources;
try {
resources = resourcePatternResolver.getResources("file:"
+ getFilesPath() + FILE_PATTERN);
} catch (IOException e) {
throw new RuntimeException(
"I/O problems when resolving the input file pattern.", e);
}
partitioner.setResources(resources);
return partitioner;
}
#Bean
#JobScope
public JobExecutionListener listener(ListItemWriter<T> writer) {
return new JobCompletionNotificationListener<T>(writer);
}
/*
* Use this if you want the writer to have job scope (JIRA BATCH-2269). Also
* change the return type of writer to ListItemWriter for this to work.
*/
#Bean
public TaskExecutor taskExecutor() {
return new SimpleAsyncTaskExecutor() {
#Override
protected void doExecute(final Runnable task) {
// gets the jobExecution of the configuration thread
final JobExecution jobExecution = JobSynchronizationManager
.getContext().getJobExecution();
super.doExecute(new Runnable() {
public void run() {
JobSynchronizationManager.register(jobExecution);
try {
task.run();
} finally {
JobSynchronizationManager.close();
}
}
});
}
};
}
}
Let's say I have to read Invoice data for the sake of discussion. I can therefore extend the above class for creating an InvoiceLoader :
#Configuration
public class InvoiceLoader extends AbstractFileLoader<Invoice>{
private class InvoiceFieldSetMapper implements FieldSetMapper<Invoice> {
public Invoice mapFieldSet(FieldSet f) {
Invoice invoice = new Invoice();
invoice.setNo(f.readString("INVOICE_NO");
return e;
}
}
#Override
public FieldSetMapper<Invoice> getFieldSetMapper() {
return new InvoiceFieldSetMapper();
}
#Override
public String getFilesPath() {
return "I:/CK/invoices/partitions/";
}
#Override
public String[] getColumnNames() {
return new String[] { "INVOICE_NO", "DATE"};
}
#Override
#Bean(name="invoiceJob")
public Job loaderJob(Step s1,
JobExecutionListener listener) {
return createJob(s1, listener);
}
#Override
public int getChunkSize() {
return 25254;
}
#Override
public int getThrottleLimit() {
return 8;
}
}
Let's say I have one more class called Inventory that extends AbstractFileLoader.
On application startup, I can load these two annotation configurations as follows :
AbstractApplicationContext context1 = new AnnotationConfigApplicationContext(InvoiceLoader.class, InventoryLoader.class);
Somewhere else in my application two different threads can launch the jobs as follows :
Thread 1 :
JobLauncher jobLauncher1 = context1.getBean(JobLauncher.class);
Job job1 = context1.getBean("invoiceJob", Job.class);
JobExecution jobExecution = jobLauncher1.run(job1, jobParams1);
Thread 2 :
JobLauncher jobLauncher1 = context1.getBean(JobLauncher.class);
Job job1 = context1.getBean("inventoryJob", Job.class);
JobExecution jobExecution = jobLauncher1.run(job1, jobParams1);
The advantage of this approach is that everytime there is a new file to be read, all that the developer/user has to do is subclass AbstractFileLoader and implement the required abstract methods without the need to get into the details of how to assemble the job.
The questions :
I am new to Spring batch so I may have overlooked some of the not-so-obvious issues with this approach such as shared internal objects in Spring batch that may cause two jobs running together to fail or obvious issues such as scoping of the beans.
Is there a better way to achieve my objective?
The fileName attribute of the #Value("#{stepExecutionContext['fileName']}") is always being assigned the value as I:/CK/invoices/partitions/ which is the value returned by getPathmethod in InvoiceLoader even though the getPathmethod inInventoryLoader`returns a different value.
One option is passing them as job parameters. For instance:
#Bean
Job job() {
jobs.get("myJob").start(step1(null)).build()
}
#Bean
#JobScope
Step step1(#Value('#{jobParameters["commitInterval"]}') commitInterval) {
steps.get('step1')
.chunk((int) commitInterval)
.reader(new IterableItemReader(iterable: [1, 2, 3, 4], name: 'foo'))
.writer(writer(null))
.build()
}
#Bean
#JobScope
ItemWriter writer(#Value('#{jobParameters["writerClass"]}') writerClass) {
applicationContext.classLoader.loadClass(writerClass).newInstance()
}
With MyWriter:
class MyWriter implements ItemWriter<Integer> {
#Override
void write(List<? extends Integer> items) throws Exception {
println "Write $items"
}
}
Then executed with:
def jobExecution = launcher.run(ctx.getBean(Job), new JobParameters([
commitInterval: new JobParameter(3),
writerClass: new JobParameter('MyWriter'), ]))
Output is:
INFO: Executing step: [step1]
Write [1, 2, 3]
Write [4]
Feb 24, 2016 2:30:22 PM org.springframework.batch.core.launch.support.SimpleJobLauncher$1 run
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{commitInterval=3, writerClass=MyWriter}] and the following status: [COMPLETED]
Status is: COMPLETED, job execution id 0
#1 step1 COMPLETED
Full example here.

Resources