How do you start each SpringBatch step with different parameters? - spring

I am new to springbatch and I am trying something where from a CSV file I am trying to read about 2000 records every 10 seconds using a quartz scheduler and write it into a database.
The problem is everytime it starts reading the file from the beginning and hence writes the same set of records into the database.
I've tried dynamically changing the paramter "setLinesToSkip" but to no avail, which is probably because it is included in my default bean definition.
Is there some way by which I can resume processing from the same spot or maybe can update the value in setlinetoskip
#Bean
public Step stepOne() {
return stepBuilderFactory
.get("stepOne")
.<Stock,Stock>chunk(5)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
#Bean
public Job readCSVFileJob1() {
return jobBuilderFactory
.get("readCSVFileJob1")
.incrementer(new RunIdIncrementer())
.start(stepOne())
.build();
}
#Bean
public ItemProcessor<Stock, Stock> processor(){
return new DBLogProcessor();
}
#Bean
public FlatFileItemReader<Stock> reader() {
FlatFileItemReader<Stock> itemReader = new FlatFileItemReader<Stock>();
itemReader.setLineMapper(lineMapper());
itemReader.setLinesToSkip(1);
itemReader.setMaxItemCount(2000);
itemReader.setResource(new FileSystemResource("example.csv"));
return itemReader;
}

Related

Spring Batch - delete

How can I do the deletion of the entities that I just persisted?
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.build();
}
#Bean
public Step syncStep() {
// read
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
reader.setRepository(repository);
reader.setMethodName("findElements");
reader.setArguments(new ArrayList<>(Arrays.asList(ZonedDateTime.now())));
final HashMap<String, Sort.Direction> sorts = new HashMap<>();
sorts.put("uid", Sort.Direction.ASC);
reader.setSort(sorts);
// write
RepositoryItemWriter<Element1> writer = new RepositoryItemWriter<>();
writer.setRepository(otherrepository);
writer.setMethodName("save");
return stepBuilderFactory.get("syncStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
It is a process of dumping elements. We pass the elements from one table to another.
It is a process of dumping elements. We pass the elements from one table to another.
You can do that in two steps. The first step copies items from one table to another. The second step deletes the items from the source table. The second step should be executed only if the first step succeeds.
There are a few options:
Using a CompositeItemWriter
You could create a second ItemWriter that does the delete logic, for example:
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
deleteWriter.setRepository(repository);
deleteWriter.setMethodName("delete");
To execute both writers you can use a CompositeItemWriter:
CompositeItemWriter<User> writer = new CompositeItemWriter<>();
// 'saveWriter' would be the writer you currently have
writer.setDelegates(List.of(saveWriter, deleteWriter));
This however won't work if your ItemProcessor transforms the original entity to something completely new. In that case I suggest using PropertyExtractingDelegatingItemWriter.
(Note, according to this question the writers run sequentially and the second writer should not be executed if the first one fails, but I'm not 100% sure on that.)
Using a separate Step
Alternatively, you could put the new writer in an entirely separate Step:
#Bean
public Step cleanupStep() {
// Same reader as before (might want to put this in a separate #Bean)
RepositoryItemReader<Element1> reader = new RepositoryItemReader<>();
// ...
// The 'deleteWriter' from before
RepositoryItemWriter<Element1> deleteWriter = new RepositoryItemWriter<>();
// ...
return stepBuilderFactory.get("cleanupStep")
.<Element1, Element2> chunk(10)
.reader(reader)
.writer(writer)
.build();
}
Now you can schedule the two steps individually:
#Bean
public Job job() {
return this.jobBuilderFactory.get("job")
.start(this.syncStep())
.next(this.cleanupStep())
.build();
}
Using a Tasklet
If you're using a separate step and depending on the amount of data, it might be more interesting to offload it entirely to the database and execute a single delete ... where ... query.
public class CleanupRepositoryTasklet implements Tasklet {
private final Repository repository;
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
repository.customDeleteMethod();
return RepeatStatus.FINISHED;
}
}
This Tasklet can then be registered in the same way as before, by declaring a new Step in your configuration:
return this.stepBuilderFactory.get("cleanupStep")
.tasklet(myTasklet())
.build();

Configure ItemWriter, ItemReader, Step and Job dynamically in Spring Batch

I'm working on process which uses Spring Integration and Spring Batch
1)Using Spring integration I will poll remote sftp dir to get different csv files as Message
2)Message which carries csv file as payload is sent downstream to Transformer which will transform Message to JobLaunchRequest
3)Spring batch reads csv files and dumps into DB
Question:
For each csv file I need to configure (ItemReader, ItemWriter, Step, Job)
So with that into consideration if I have to deal with 10 different csv files do I have to configure all 4 beans listed above for each csv?
CSVs differs in HeaderNames and HeaderCount and each csv has different JPA Entity
Eventually I will have 40 #Bean Configurations which ideally I think is bad
Can anyone suggest me if this is how spring batch is made to work or there is other way to make it one common dynamic bean for different CSVs
Here is code:
IntegartionFlow:
#Bean
public IntegrationFlow integrationFlow(JobLaunchingGateway jobLaunchingGateway) {
return IntegrationFlows.from(Sftp.inboundAdapter(sftpSessionFactory)
.remoteDirectory("/uploads")
.localDirectory(new File("C:\\Users\\DELL\\Desktop\\local"))
.patternFilter("*.csv")
.autoCreateLocalDirectory(true)
, c -> c.poller(Pollers.fixedRate(1000).taskExecutor(taskExecutor()).maxMessagesPerPoll(1)))
.transform(fileMessageToJobRequest())
.handle(jobLaunchingGateway)
.log(LoggingHandler.Level.WARN, "headers.id + ': ' + payload")
.route(JobExecution.class, j -> j.getStatus().isUnsuccessful() ? "jobFailedChannel" : "jobSuccessfulChannel")
.get();
}
Transformer:
#Transformer
public JobLaunchRequest toRequest(Message<File> message) {
JobParametersBuilder jobParametersBuilder =
new JobParametersBuilder();
jobParametersBuilder.addString(fileParameterName,
message.getPayload().getAbsolutePath());
jobParametersBuilder.addLong("key.id", System.currentTimeMillis());
return new JobLaunchRequest(job, jobParametersBuilder.toJobParameters());
}
Batch Job:
#Bean
public Job vendorMasterBatchJob(Step vendorMasterStep) {
return jobBuilderFactory.get("vendorMasterBatchJob")
.incrementer(new RunIdIncrementer())
.start(vendorMasterStep)
.listener(deleteInputFileJobListener)
.build();
}
Batch Step:
#Bean
public Step vendorMasterStep(FlatFileItemReader<ERPVendorMaster> vendorMasterReader,
JpaItemWriter<ERPVendorMaster> vendorMasterWriter) {
return stepBuilderFactory.get("vendorMasterStep")
.<ERPVendorMaster, ERPVendorMaster>chunk(chunkSize)
.reader(vendorMasterReader)
.writer(vendorMasterWriter)
.faultTolerant()
.skipLimit(Integer.MAX_VALUE)
.skip(RuntimeException.class)
.listener(skipListener)
.build();
}
ItemWriter:
#Bean
public JpaItemWriter<ERPVendorMaster> vendorMasterWriter() {
return new JpaItemWriterBuilder<ERPVendorMaster>()
.entityManagerFactory(entityManagerFactory)
.build();
}
ItemReader:
#Bean
#StepScope
public FlatFileItemReader<ERPVendorMaster> vendorMasterReader(#Value("#{jobParameters['input.file.name']}") String fileName) {
return new FlatFileItemReaderBuilder<ERPVendorMaster>()
.name("vendorMasterItemReader")
.resource(new FileSystemResource(fileName))
.linesToSkip(1)
.delimited()
.names(commaSeparatedVendorMasterHeaderValues.split(","))
.fieldSetMapper(new BeanWrapperFieldSetMapper<ERPVendorMaster>() {{
setConversionService(stringToDateConversionService());
setTargetType(ERPVendorMaster.class);
}})
.build();
}
I'm very new to Spring boot any help will be appreciated
Thanks

How to specify the taskexecutor to spawn thread after reading a single csv by using MultiResourceItemReader in spring batch

Trying to read multiple files in spring batch using MultiResourceItemReader and also have taskExecutor for the records in each file to read in multithreading. Suppose there are 3 csv files in a folder, MultiResourceItemReader should execute it one by one but as I have taskExecutor , different threads takes up csv files like two csv files are taken up by the threads from the same folder and starts executing .
Expectation:-
MultiResourceItemReader should read first file and then taskExecutor should spawn different threads and execute. Then another file should be picked up and should be taken by taskExecutor for execution.
Code Snippet/Batch_Configuration :-
#Bean
public Step Step1() {
return stepBuilderFactory.get("Step1")
.<POJO, POJO>chunk(5)
.reader(multiResourceItemReader())
.writer(writer())
.taskExecutor(taskExecutor()).throttleLimit(throttleLimit)
.build();
}
#Bean
public MultiResourceItemReader<POJO> multiResourceItemReader()
{
MultiResourceItemReader<POJO> resourceItemReader = new MultiResourceItemReader<POJO>();
ClassLoader cl = this.getClass().getClassLoader();
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(cl);
Resource[] resources;
try {
resources = resolver.getResources("file:/temp/*.csv");
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(itemReader());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return resourceItemReader;
}
maybe you should check Partitioner (https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html#partitioning) and implement something like the following:
#Bean
public Step mainStep(StepBuilderFactory stepBuilderFactory,
FlatFileItemReader itemReader,
ListDelegateWriter listDelegateWriter,
BatchProperties batchProperties) {
return stepBuilderFactory.get(Steps.MAIN)
.<POJO, POJO>chunk(pageSize)
.reader(unmatchedItemReader)
.writer(listDelegateWriter)
.build();
}
#Bean
public TaskExecutor jobTaskExecutor(#Value("${batch.config.core-pool-size}") Integer corePoolSize,
#Value("${batch.config.max-pool-size}") Integer maxPoolSize) {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(corePoolSize);
taskExecutor.setMaxPoolSize(maxPoolSize);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
#Bean
#StepScope
public Partitioner partitioner(#Value("#{jobExecutionContext[ResourcesToRead]}") String[] resourcePaths,
#Value("${batch.config.grid-size}") Integer gridSize) {
Resource[] resourceList = Arrays.stream(resourcePaths)
.map(FileSystemResource::new)
.toArray(Resource[]::new);
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
partitioner.setResources(resourceList);
partitioner.partition(gridSize);
return partitioner;
}
#Bean
public Step masterStep(StepBuilderFactory stepBuilderFactory, Partitioner partitioner, Step mainStep, TaskExecutor jobTaskExecutor) {
return stepBuilderFactory.get(BatchConstants.MASTER)
.partitioner(mainStep)
.partitioner(Steps.MAIN, partitioner)
.taskExecutor(jobTaskExecutor)
.build();
}

Spring batch Job Failing

All the steps being passed still the job is completed with failed status.
#Bean
public Job job() {
return this.jobBuilderFactory.get("person-job")
.start(initializeBatch())
.next(readBodystep())
.on("STOPPED")
.stopAndRestart(initializeBatch())
.end()
.validator(batchJobParamValidator)
.incrementer(jobParametersIncrementer)
.listener(jobListener)
.build();
}
#Bean
public Flow preProcessingFlow() {
return new FlowBuilder<Flow>("preProcessingFlow")
.start(extractFooterAndBodyStep())
.next(readFooterStep())
.build();
}
#Bean
public Step initializeBatch() {
return this.stepBuilderFactory.get("initializeBatch")
.flow(preProcessingFlow())
.build();
public Step readBodystep() {
return this.stepBuilderFactory.get("readChunkStep")
.<PersonDTO, PersonBO>chunk(10)
.reader(personFileBodyReader)
.processor(itemProcessor())
.writer(dummyWriter)
.listener(new ReadFileStepListener())
.listener(personFileBodyReader)
.build();
}
is anything wrong with the above configuration?
When I am removing the stopAndRestart configuration, it is getting passed.
For your use case, it is not stopAndRestart that you need, it is rather setting allowStartIfComplete on the step. With that, if the job fails, the step will be re-executed even if it was successfully completed in the previous run.

To separate steps class in spring batch

I have tried to find the solution but I cannot... ㅠㅠ
I want to separate steps in a job like below.
step1.class -> step2.class -> step3.class -> done
The reason why I'm so divided is that I have to use queries each step.
#Bean
public Job bundleJob() {
return jobBuilderFactory.get(JOB_NAME)
.start(step1) // bean
.next(step2) // bean
.next(step3()) // and here is the code ex) reader, processor, writer
.build();
}
my purpose is that I have to use the return data in step1, step2.
but jpaItemReader is like async ... so it doesn't process like above order.
debug flow like this.
readerStep1 -> writerStep1 -> readerStep2 -> readerWriter2 -> readerStep3 -> writerStep3
and
-> processorStep1 -> processorStep2 -> processorStep3
that is the big problem to me...
How can I wait each step in a job? Including querying.
aha! I got it.
the point is the creating beans in a configuration.
I wrote annotation bean all kinds of steps so that those are created by spring.
the solution is late binding like #JobScope or #StepScope
#Bean
#StepScope. // late creating bean.
public ListItemReader<Dto> itemReader() {
// business logic
return new ListItemReader<>(dto);
}
To have a separate steps in your job you can use a Flow with a TaskletStep. Sharing a snippet for your reference,
#Bean
public Job processJob() throws Exception {
Flow fetchData = (Flow) new FlowBuilder<>("fetchData")
.start(fetchDataStep()).build();
Flow transformData = (Flow) new FlowBuilder<>("transformData")
.start(transformData()).build();
Job job = jobBuilderFactory.get("processTenantLifeCycleJob").incrementer(new RunIdIncrementer())
.start(fetchData).next(transformData).next(processData()).end()
.listener(jobCompletionListener()).build();
ReferenceJobFactory referenceJobFactory = new ReferenceJobFactory(job);
registry.register(referenceJobFactory);
return job;
}
#Bean
public TaskletStep fetchDataStep() {
return stepBuilderFactory.get("fetchData")
.tasklet(fetchDataValue()).listener(fetchDataStepListener()).build();
}
#Bean
#StepScope
public FetchDataValue fetchDataValue() {
return new FetchDataValue();
}
#Bean
public TaskletStep transformDataStep() {
return stepBuilderFactory.get("transformData")
.tasklet(transformValue()).listener(sendReportDataCompletionListener()).build();
}
#Bean
#StepScope
public TransformValue transformValue() {
return new TransformValue();
}
#Bean
public Step processData() {
return stepBuilderFactory.get("processData").<String, Data>chunk(chunkSize)
.reader(processDataReader()).processor(dataProcessor()).writer(processDataWriter())
.listener(processDataListener())
.taskExecutor(backupTaskExecutor()).build();
}
In this example I have used 2 Flows to Fetch and Transform data which will execute data from a class.
In order to return the value of those from the step 1 and 2, you can store the value in the job context and retrieve that in the ProcessData Step which has a reader, processor and writer.

Resources