To separate steps class in spring batch - spring

I have tried to find the solution but I cannot... ㅠㅠ
I want to separate steps in a job like below.
step1.class -> step2.class -> step3.class -> done
The reason why I'm so divided is that I have to use queries each step.
#Bean
public Job bundleJob() {
return jobBuilderFactory.get(JOB_NAME)
.start(step1) // bean
.next(step2) // bean
.next(step3()) // and here is the code ex) reader, processor, writer
.build();
}
my purpose is that I have to use the return data in step1, step2.
but jpaItemReader is like async ... so it doesn't process like above order.
debug flow like this.
readerStep1 -> writerStep1 -> readerStep2 -> readerWriter2 -> readerStep3 -> writerStep3
and
-> processorStep1 -> processorStep2 -> processorStep3
that is the big problem to me...
How can I wait each step in a job? Including querying.

aha! I got it.
the point is the creating beans in a configuration.
I wrote annotation bean all kinds of steps so that those are created by spring.
the solution is late binding like #JobScope or #StepScope
#Bean
#StepScope. // late creating bean.
public ListItemReader<Dto> itemReader() {
// business logic
return new ListItemReader<>(dto);
}

To have a separate steps in your job you can use a Flow with a TaskletStep. Sharing a snippet for your reference,
#Bean
public Job processJob() throws Exception {
Flow fetchData = (Flow) new FlowBuilder<>("fetchData")
.start(fetchDataStep()).build();
Flow transformData = (Flow) new FlowBuilder<>("transformData")
.start(transformData()).build();
Job job = jobBuilderFactory.get("processTenantLifeCycleJob").incrementer(new RunIdIncrementer())
.start(fetchData).next(transformData).next(processData()).end()
.listener(jobCompletionListener()).build();
ReferenceJobFactory referenceJobFactory = new ReferenceJobFactory(job);
registry.register(referenceJobFactory);
return job;
}
#Bean
public TaskletStep fetchDataStep() {
return stepBuilderFactory.get("fetchData")
.tasklet(fetchDataValue()).listener(fetchDataStepListener()).build();
}
#Bean
#StepScope
public FetchDataValue fetchDataValue() {
return new FetchDataValue();
}
#Bean
public TaskletStep transformDataStep() {
return stepBuilderFactory.get("transformData")
.tasklet(transformValue()).listener(sendReportDataCompletionListener()).build();
}
#Bean
#StepScope
public TransformValue transformValue() {
return new TransformValue();
}
#Bean
public Step processData() {
return stepBuilderFactory.get("processData").<String, Data>chunk(chunkSize)
.reader(processDataReader()).processor(dataProcessor()).writer(processDataWriter())
.listener(processDataListener())
.taskExecutor(backupTaskExecutor()).build();
}
In this example I have used 2 Flows to Fetch and Transform data which will execute data from a class.
In order to return the value of those from the step 1 and 2, you can store the value in the job context and retrieve that in the ProcessData Step which has a reader, processor and writer.

Related

Spring Batch - delete

How can I do the deletion of the entities that I just persisted?
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.build();
}
#Bean
public Step syncStep() {
// read
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
reader.setRepository(repository);
reader.setMethodName("findElements");
reader.setArguments(new ArrayList<>(Arrays.asList(ZonedDateTime.now())));
final HashMap<String, Sort.Direction> sorts = new HashMap<>();
sorts.put("uid", Sort.Direction.ASC);
reader.setSort(sorts);
// write
RepositoryItemWriter<Element1> writer = new RepositoryItemWriter<>();
writer.setRepository(otherrepository);
writer.setMethodName("save");
return stepBuilderFactory.get("syncStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
It is a process of dumping elements. We pass the elements from one table to another.
It is a process of dumping elements. We pass the elements from one table to another.
You can do that in two steps. The first step copies items from one table to another. The second step deletes the items from the source table. The second step should be executed only if the first step succeeds.
There are a few options:
Using a CompositeItemWriter
You could create a second ItemWriter that does the delete logic, for example:
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
deleteWriter.setRepository(repository);
deleteWriter.setMethodName("delete");
To execute both writers you can use a CompositeItemWriter:
CompositeItemWriter<User> writer = new CompositeItemWriter<>();
// 'saveWriter' would be the writer you currently have
writer.setDelegates(List.of(saveWriter, deleteWriter));
This however won't work if your ItemProcessor transforms the original entity to something completely new. In that case I suggest using PropertyExtractingDelegatingItemWriter.
(Note, according to this question the writers run sequentially and the second writer should not be executed if the first one fails, but I'm not 100% sure on that.)
Using a separate Step
Alternatively, you could put the new writer in an entirely separate Step:
#Bean
public Step cleanupStep() {
// Same reader as before (might want to put this in a separate #Bean)
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
// ...
// The 'deleteWriter' from before
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
// ...
return stepBuilderFactory.get("cleanupStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.writer(writer)
.build();
}
Now you can schedule the two steps individually:
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.next(this.cleanupStep())
.build();
}
Using a Tasklet
If you're using a separate step and depending on the amount of data, it might be more interesting to offload it entirely to the database and execute a single delete ... where ... query.
public class CleanupRepositoryTasklet implements Tasklet {
private final Repository repository;
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
repository.customDeleteMethod();
return RepeatStatus.FINISHED;
}
}
This Tasklet can then be registered in the same way as before, by declaring a new Step in your configuration:
return this.stepBuilderFactory.get("cleanupStep")
.tasklet(myTasklet())
.build();

Spring-batch step not re-executed (cache)

I'm working on a project that includes Spring batch, before copying the code snippets, I'm going to summarize easily how the job works with a cron.
the cron calls a rest API on my project (#PostMapping("/jobs/external/{jobName}"))
in the post method, I get the job and execute it.
in each execution, I'm supposed to run a step.
the step contains a reader (external rest call to elastic API to get documents) and a processor.
now my problem: in the catalina.out, I'm able to see the rest call from the cron every 10 minutes as configured in my cron. BUT, the step doesn't seem to make that call to elastic every 10 minutes, the batch process always has the same set of data, which is fetched one time when the batch is called during tomcat restart.
job rest api :
#PostMapping("/jobs/external/{jobName}")
#Timed
public ResponseEntity start(#PathVariable String jobName) throws BatchException {
log.info("LAUNCHING JOB FROM EXTERNAL : {}, timestamp : {}", jobName, Instant.now().toString());
try {
Job job = jobRegistry.getJob(jobName);
JobParametersBuilder builder = new JobParametersBuilder();
builder.addDate("date", new Date());
return Optional.of(jobLauncher.run(job, builder.toJobParameters()))
.map(BatchExecutionVM::new)
.map(exec -> ResponseEntity
.ok()
.headers(HeaderUtil.createAlert("jobManagement.started", jobName))
.body(exec))
.orElseGet(() -> ResponseEntity.badRequest().build());
} catch (NoSuchJobException aEx) {
log.warn(JOB_NOT_FOUND, aEx);
throw new BatchException();
} catch (JobInstanceAlreadyCompleteException | JobExecutionAlreadyRunningException | JobRestartException aEx) {
log.warn("Job execution error.", aEx);
throw new BatchException();
} catch (JobParametersInvalidException aEx) {
log.warn("Job parameters are invalid.", aEx);
throw new BatchException();
}
}
job configuration :
#Bean
public Job usualJob() {
return jobBuilderFactory
.get("usualJob")
.incrementer(new SimpleJobIncrementer())
.flow(readUsualStep())
.end()
.build();
}
#Bean
public Step readUsualStep() {
// TODO: simplifier on n'a pas besoin de chunk
return stepBuilderFactory.get("readUsualStep")
.allowStartIfComplete(true)
.<AlertDocument, Void>chunk(25)
.readerIsTransactionalQueue()
.reader(rowItemReader())
.processor(rowItemProcessor())
.build();
}
#Bean
public ItemReader<AlertDocument> rowItemReader() {
return new UsualItemReader(usualService.getLastAlerts());
}
#Bean
public UsualMapRowProcessor rowItemProcessor() {
return new UsualMapRowProcessor();
}
i don't know why usualService.getLastAlerts() is called just once and not every 10 minutes.
thanks to M. Deinum, this is basically the solution :
#Bean
#StepScope
public ItemReader<AlertDocument> rowItemReader() {
return new UsualItemReader(usualService.getLastAlerts());
}
annotating the step bean with stepScope annotation will make it reinstantiate every step.

Uses of JobExecutionDecider in Spring Batch split flow using SimpleAsyncTaskExecutor

I want to configure a Spring Batch job with 4 steps. Step-2 and Step-3 are independent to each other. So I want to execute then in parallel. Any of these 2 steps or both can be skipped depending on Execution Parameter. Check the flow as mentioned below :
Batch flow details
Java Configuration as mentioned below:
#Bean
public Job sampleBatchJob()
throws Exception {
final Flow step1Flow = new FlowBuilder<SimpleFlow>("step1Flow")
.from(step1Tasklet()).end();
final Flow step2Flow = new FlowBuilder<SimpleFlow>("step2Flow")
.from(new step2FlowDecider()).on("EXECUTE").to(step2MasterStep())
.from(new step2FlowDecider()).on("SKIP").end(ExitStatus.COMPLETED.getExitCode())
.build();
final Flow step3Flow = new FlowBuilder<SimpleFlow>("step3Flow")
.from(new step3FlowDecider()).on("EXECUTE").to(step3MasterStep())
.from(new step3FlowDecider()).on("SKIP").end(ExitStatus.COMPLETED.getExitCode())
.build();
final Flow splitFlow = new FlowBuilder<Flow>("splitFlow")
.split(new SimpleAsyncTaskExecutor())
.add(step2Flow, step3Flow)
.build();
return jobBuilderFactory().get("sampleBatchJob")
.start(step1Flow)
.next(splitFlow)
.next(step4MasterStep())
.end()
.build();
}
Sample code for Step2FlowDecider:
public class Step2FlowDecider
implements JobExecutionDecider {
#Override
public FlowExecutionStatus decide(JobExecution jobExecution, StepExecution stepExecution) {
if (StringUtils.equals("Y", batchParameter.executeStep2())) {
return new FlowExecutionStatus("EXECUTE");
}
return new FlowExecutionStatus("SKIP");
}
}
With this configuration, when I try to execute the batch, it is getting failed, without any details error log.

Getting an Error like this - "jobParameters cannot be found on object of type BeanExpressionContext"

We're creating a spring batch app that reads data from a database and writes in another database. In this process, we need to dynamically set the parameter to the SQL as we have parameters that demands data accordingly.
For this, We created a JdbcCursorItemReader Reader with #StepScope as I've found in other articles and tutorials. But was not successful. The chunk reader in our Job actually uses Peekable reader which internally uses the JdbcCursorItemReader object to perform the actual read operation.
When the job is triggered, we get the error - "jobParameters cannot be found on object of type BeanExpressionContext"
Please let me know what is that I am doing wrongly in the bean configuration below.
#Bean
#StepScope
#Scope(proxyMode = ScopedProxyMode.TARGET_CLASS)
public JdbcCursorItemReader<DTO> jdbcDataReader(#Value() String param) throws Exception {
JdbcCursorItemReader<DTO> databaseReader = new JdbcCursorItemReader<DTO>();
return databaseReader;
}
// This class extends PeekableReader, and sets JdbcReader (jdbcDataReader) as delegate
#Bean
public DataPeekReader getPeekReader() {
DataPeekReader peekReader = new DataPeekReader();
return peekReader;
}
// This is the reader that uses Peekable Item Reader (getPeekReader) and also specifies chunk completion policy.
#Bean
public DataReader getDataReader() {
DataReader dataReader = new DataReader();
return dataReader;
}
// This is the step builder.
#Bean
public Step readDataStep() throws Exception {
return stepBuilderFactory.get("readDataStep")
.<DTO, DTO>chunk(getDataReader())
.reader(getDataReader())
.writer(getWriter())
.build();
}
#Bean
public Job readReconDataJob() throws Exception {
return jobBuilderFactory.get("readDataJob")
.incrementer(new RunIdIncrementer())
.flow(readDataStep())
.end()
.build();
}
Please let me know what is that I am doing wrongly in the bean configuration below.
Your jdbcDataReader(#Value() String param) is incorrect. You need to specify a Spel expression in the #Value to specify which parameter to inject. Here is an example of how to pass a job parameter to a JdbcCursorItemReader:
#Bean
#StepScope
public JdbcCursorItemReader<DTO> jdbcCursorItemReader(#Value("#{jobParameters['table']}") String table) {
return new JdbcCursorItemReaderBuilder<DTO>()
.sql("select * from " + table)
// set other properties
.build();
}
You can find more details in the late binding section of the reference documentation.

How do you start each SpringBatch step with different parameters?

I am new to springbatch and I am trying something where from a CSV file I am trying to read about 2000 records every 10 seconds using a quartz scheduler and write it into a database.
The problem is everytime it starts reading the file from the beginning and hence writes the same set of records into the database.
I've tried dynamically changing the paramter "setLinesToSkip" but to no avail, which is probably because it is included in my default bean definition.
Is there some way by which I can resume processing from the same spot or maybe can update the value in setlinetoskip
#Bean
public Step stepOne() {
return stepBuilderFactory
.get("stepOne")
.<Stock,Stock>chunk(5)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
#Bean
public Job readCSVFileJob1() {
return jobBuilderFactory
.get("readCSVFileJob1")
.incrementer(new RunIdIncrementer())
.start(stepOne())
.build();
}
#Bean
public ItemProcessor<Stock, Stock> processor(){
return new DBLogProcessor();
}
#Bean
public FlatFileItemReader<Stock> reader() {
FlatFileItemReader<Stock> itemReader = new FlatFileItemReader<Stock>();
itemReader.setLineMapper(lineMapper());
itemReader.setLinesToSkip(1);
itemReader.setMaxItemCount(2000);
itemReader.setResource(new FileSystemResource("example.csv"));
return itemReader;
}

Resources