Spring batch chunk size - spring

I am new in spring batch and I am stuck in something basic I think.
I create a Job configuration like this:
//reader
#Bean
public ItemReader<UnprocessedTrek> atReader() {
//AnalyzeTrekItemReader reader = new AnalyzeTrekItemReader();
JdbcCursorItemReader<UnprocessedTrek> reader = new JdbcCursorItemReader<UnprocessedTrek>();
reader.setSql("SELECT * FROM " + UnprocessedTrek.TBL_NAME);
reader.setRowMapper(new UnprocessedTrekRowMapper());
reader.setDataSource(rntDataSource);
reader.setFetchSize(0);
return reader;
}
//processor
#Bean
public ItemProcessor<UnprocessedTrek, Document> atProcessor()
{
AnalyzeTrekItemProcessor processor = new AnalyzeTrekItemProcessor();
return processor;
}
//writer
#Bean
public ItemWriter<Document> atWriter()
{
AnalyzeTrekItemWriter writer = new AnalyzeTrekItemWriter();
return writer;
}
#Bean
public Step analyzeTrek()
{
return steps.get("analyzeTrek")
.<UnprocessedTrek, Document> chunk(50)
.reader(atReader())
.processor(atProcessor())
.writer(atWriter())
.build();
}
My problem is that when the size of items processed is inferior as 50 the writer is not called. What I am missing in my configuration ?
Thanks for your help.

Related

spring batch flatfileitemwriter for multifile of large data

i use chunk for write files. i have two tables files and datas
config.java
public ListItemReader<> reader(String fileName) {
listItemReader = selectDataOfFileFromDB(fileName);
....
return listItemReader;
}
public FlatFileItemWriter<> writer(fileName) {
FlatFileItemWriter<> delegate = new FlatFileItemWriterBuilder<>
.name(fileName+XXX)
.resource(new FileSystemResource("/xxx/xxx/xxx/"+ fileName)).build();
return delegate;
}
public Step xxxxStep(fileName) {
return stepBuilderFactory.get("xxxxstep" + XXXX)
.reader(reader(fileName))
.writer(writer(fileName)).build();
}
#Bean
public Job xxxJob() {
List<fileName> list = selectFileNameFromDB();
JobBuilder xx = jobBuilderFactory.get("XXXXjob");
SimpleJobBuilder a = null;
a = xx.start(xxxxStep(list.get(0)));
a.next(xxxxStep(list.get(1)))
a.next(xxxxStep(list.get(2))
a.next(xxxxStep(list.get(3))
.....
a.next(xxxxStep(list.get(n))
}
I can write data to each of file but it not smart. any other solution is?
I try the classifiercompositeitemwriter but not suitable!

How to specify the taskexecutor to spawn thread after reading a single csv by using MultiResourceItemReader in spring batch

Trying to read multiple files in spring batch using MultiResourceItemReader and also have taskExecutor for the records in each file to read in multithreading. Suppose there are 3 csv files in a folder, MultiResourceItemReader should execute it one by one but as I have taskExecutor , different threads takes up csv files like two csv files are taken up by the threads from the same folder and starts executing .
Expectation:-
MultiResourceItemReader should read first file and then taskExecutor should spawn different threads and execute. Then another file should be picked up and should be taken by taskExecutor for execution.
Code Snippet/Batch_Configuration :-
#Bean
public Step Step1() {
return stepBuilderFactory.get("Step1")
.<POJO, POJO>chunk(5)
.reader(multiResourceItemReader())
.writer(writer())
.taskExecutor(taskExecutor()).throttleLimit(throttleLimit)
.build();
}
#Bean
public MultiResourceItemReader<POJO> multiResourceItemReader()
{
MultiResourceItemReader<POJO> resourceItemReader = new MultiResourceItemReader<POJO>();
ClassLoader cl = this.getClass().getClassLoader();
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(cl);
Resource[] resources;
try {
resources = resolver.getResources("file:/temp/*.csv");
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(itemReader());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return resourceItemReader;
}
maybe you should check Partitioner (https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html#partitioning) and implement something like the following:
#Bean
public Step mainStep(StepBuilderFactory stepBuilderFactory,
FlatFileItemReader itemReader,
ListDelegateWriter listDelegateWriter,
BatchProperties batchProperties) {
return stepBuilderFactory.get(Steps.MAIN)
.<POJO, POJO>chunk(pageSize)
.reader(unmatchedItemReader)
.writer(listDelegateWriter)
.build();
}
#Bean
public TaskExecutor jobTaskExecutor(#Value("${batch.config.core-pool-size}") Integer corePoolSize,
#Value("${batch.config.max-pool-size}") Integer maxPoolSize) {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(corePoolSize);
taskExecutor.setMaxPoolSize(maxPoolSize);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
#Bean
#StepScope
public Partitioner partitioner(#Value("#{jobExecutionContext[ResourcesToRead]}") String[] resourcePaths,
#Value("${batch.config.grid-size}") Integer gridSize) {
Resource[] resourceList = Arrays.stream(resourcePaths)
.map(FileSystemResource::new)
.toArray(Resource[]::new);
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
partitioner.setResources(resourceList);
partitioner.partition(gridSize);
return partitioner;
}
#Bean
public Step masterStep(StepBuilderFactory stepBuilderFactory, Partitioner partitioner, Step mainStep, TaskExecutor jobTaskExecutor) {
return stepBuilderFactory.get(BatchConstants.MASTER)
.partitioner(mainStep)
.partitioner(Steps.MAIN, partitioner)
.taskExecutor(jobTaskExecutor)
.build();
}

How to change my job configuration to add file name dynamically

I have a spring batch job which reads from a db then outputs to a multiple csv's. Inside my db I have a special column named divisionId. A CSV file should exist for every distinct value of divisionId. I split out the data using a ClassifierCompositeItemWriter.
At the moment I have an ItemWriter bean defined for every distinct value of divisionId. The beans are the same, it's only the file name that is different.
How can I change the configuration below to create a file with the divisionId automatically pre-pended to the file name without having to register a new ItemWriter for each divisionId?
I've been playing around with #JobScope and #StepScope annotations but can't get it right.
Thanks in advance.
#Bean
public Step readStgDbAndExportMasterListStep() {
return commonJobConfig.stepBuilderFactory
.get("readStgDbAndExportMasterListStep")
.<MasterList,MasterList>chunk(commonJobConfig.chunkSize)
.reader(commonJobConfig.queryStagingDbReader())
.processor(masterListOutputProcessor())
.writer(masterListFileWriter())
.stream((ItemStream) divisionMasterListFileWriter45())
.stream((ItemStream) divisionMasterListFileWriter90())
.build();
}
#Bean
public ItemWriter<MasterList> masterListFileWriter() {
BackToBackPatternClassifier classifier = new BackToBackPatternClassifier();
classifier.setRouterDelegate(new DivisionClassifier());
classifier.setMatcherMap(new HashMap<String, ItemWriter<? extends MasterList>>() {{
put("45", divisionMasterListFileWriter45());
put("90", divisionMasterListFileWriter90());
}});
ClassifierCompositeItemWriter<MasterList> writer = new ClassifierCompositeItemWriter<MasterList>();
writer.setClassifier(classifier);
return writer;
}
#Bean
public ItemWriter<MasterList> divisionMasterListFileWriter45() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(commonJobConfig.outDir, "45_masterList" + "" + ".csv")));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
#Bean
public ItemWriter<MasterList> divisionMasterListFileWriter90() {
FlatFileItemWriter<MasterList> writer = new FlatFileItemWriter<>();
writer.setResource(new FileSystemResource(new File(commonJobConfig.outDir, "90_masterList" + "" + ".csv")));
writer.setHeaderCallback(masterListFlatFileHeaderCallback());
writer.setLineAggregator(masterListFormatterLineAggregator());
return writer;
}
I came up with a pretty complex way of doing this. I followed a tutorial at https://github.com/langmi/spring-batch-examples/wiki/Rename-Files.
The premise is to use the step execution context to place the file name in it.

How to make writer initiated for every job instance in spring batch

I am writing a spring batch job. I am implementing custom writer using KafkaClientWriter extends AbstractItemStreamItemWriter<ProducerMessage>
I have fields which need to be unique for each instance. But I could see this class initiated only once. Rest jobs have same instance of writer class.
Where as my custom readers and processors are getting initiated for each job.
Below is my job configurations. How can I achieve the same behavior for writer as well?
#Bean
#Scope("job")
public ZipMultiResourceItemReader reader(#Value("#{jobParameters[fileName]}") String fileName, #Value("#{jobParameters[s3SourceFolderPrefix]}") String s3SourceFolderPrefix, #Value("#{jobParameters[timeStamp]}") long timeStamp, com.fastretailing.catalogPlatformSCMProducer.service.ConfigurationService confService) {
FlatFileItemReader faltFileReader = new FlatFileItemReader();
ZipMultiResourceItemReader zipReader = new ZipMultiResourceItemReader();
Resource[] resArray = new Resource[1];
resArray[0] = new FileSystemResource(new File(fileName));
zipReader.setArchives(resArray);
DefaultLineMapper<ProducerMessage> lineMapper = new DefaultLineMapper<ProducerMessage>();
lineMapper.setLineTokenizer(new DelimitedLineTokenizer());
CSVFieldMapper csvFieldMapper = new CSVFieldMapper(fileName, s3SourceFolderPrefix, timeStamp, confService);
lineMapper.setFieldSetMapper(csvFieldMapper);
faltFileReader.setLineMapper(lineMapper);
zipReader.setDelegate(faltFileReader);
return zipReader;
}
#Bean
#Scope("job")
public ItemProcessor<ProducerMessage, ProducerMessage> processor(#Value("#{jobParameters[timeStamp]}") long timeStamp) {
ProducerProcessor processor = new ProducerProcessor();
processor.setS3FileTimeStamp(timeStamp);
return processor;
}
#Bean
#ConfigurationProperties
public ItemWriter<ProducerMessage> writer() {
return new KafkaClientWriter();
}
#Bean
public Step step1(StepBuilderFactory stepBuilderFactory,
ItemReader reader, ItemWriter writer,
ItemProcessor processor, #Value("${reader.chunkSize}")
int chunkSize) {
LOGGER.info("Step configuration loaded with chunk size {}", chunkSize);
return stepBuilderFactory.get("step1")
.chunk(chunkSize).reader(reader)
.processor(processor).writer(writer)
.build();
}
#Bean
public StepScope stepScope() {
final StepScope stepScope = new StepScope();
stepScope.setAutoProxy(true);
return stepScope;
}
#Bean
public JobScope jobScope() {
final JobScope jobScope = new JobScope();
return jobScope;
}
#Bean
public Configuration configuration() {
return new Configuration();
}
I tried making the writer with job scope. But in that case open is not getting called. This is where I am doing some initializations.
When using java based configuration and a scoped proxy what happens is that the return type of the method is detected and for that a proxy is created. So when you return ItemWriter you will get a JDK proxy only implementing ItemWriter, whereas your open method is on the ItemStream interface. Because that interface isn't included on the proxy there is no way to call the method.
Either change the return type to KafkaClientWriter or ItemStreamWriter< ProducerMessage> (assuming the KafkaCLientWriter implements that method). Next add #Scope("job") and you should have your open method called again with a properly scoped writer.

Spring batch : Assemble a job rather than configuring it (Extensible job configuration)

Background
I am working on designing a file reading layer that can read delimited files and load it in a List. I have decided to use Spring Batch because it provides a lot of scalability options which I can leverage for different sets of files depending on their size.
The requirement
I want to design a generic Job API that can be used to read any delimited file.
There should be a single Job structure that should be used for parsing every delimited file. For example, if the system needs to read 5 files, there will be 5 jobs (one for each file). The only way the 5 jobs will be different from each other is that they will use a different FieldSetMapper, column name, directory path and additional scaling parameters such as commit-interval and throttle-limit.
The user of this API should not need to configure a Spring
batch job, step, chunking, partitioning, etc on his own when a new file type is introduced in the system.
All that the user needs to do is to provide the FieldsetMapperto be used by the job along with the commit-interval, throttle-limit and the directory where each type of file will be placed.
There will be one predefined directory per file. Each directory can contain multiple files of the same type and format. A MultiResourcePartioner will be used to look inside a directory. The number of partitions = number of files in the directory.
My requirement is to build a Spring Batch infrastructure that gives me a unique job I can launch once I have the bits and pieces that will make up the job.
My solution :
I created an abstract configuration class that will be extended by concrete configuration classes (There will be 1 concrete class per file to be read).
#Configuration
#EnableBatchProcessing
public abstract class AbstractFileLoader<T> {
private static final String FILE_PATTERN = "*.dat";
#Autowired
JobBuilderFactory jobs;
#Autowired
ResourcePatternResolver resourcePatternResolver;
public final Job createJob(Step s1, JobExecutionListener listener) {
return jobs.get(this.getClass().getSimpleName())
.incrementer(new RunIdIncrementer()).listener(listener)
.start(s1).build();
}
public abstract Job loaderJob(Step s1, JobExecutionListener listener);
public abstract FieldSetMapper<T> getFieldSetMapper();
public abstract String getFilesPath();
public abstract String[] getColumnNames();
public abstract int getChunkSize();
public abstract int getThrottleLimit();
#Bean
#StepScope
#Value("#{stepExecutionContext['fileName']}")
public FlatFileItemReader<T> reader(String file) {
FlatFileItemReader<T> reader = new FlatFileItemReader<T>();
String path = file.substring(file.indexOf(":") + 1, file.length());
FileSystemResource resource = new FileSystemResource(path);
reader.setResource(resource);
DefaultLineMapper<T> lineMapper = new DefaultLineMapper<T>();
lineMapper.setFieldSetMapper(getFieldSetMapper());
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(",");
tokenizer.setNames(getColumnNames());
lineMapper.setLineTokenizer(tokenizer);
reader.setLineMapper(lineMapper);
reader.setLinesToSkip(1);
return reader;
}
#Bean
public ItemProcessor<T, T> processor() {
// TODO add transformations here
return null;
}
#Bean
#JobScope
public ListItemWriter<T> writer() {
ListItemWriter<T> writer = new ListItemWriter<T>();
return writer;
}
#Bean
#JobScope
public Step readStep(StepBuilderFactory stepBuilderFactory,
ItemReader<T> reader, ItemWriter<T> writer,
ItemProcessor<T, T> processor, TaskExecutor taskExecutor) {
final Step readerStep = stepBuilderFactory
.get(this.getClass().getSimpleName() + " ReadStep:slave")
.<T, T> chunk(getChunkSize()).reader(reader)
.processor(processor).writer(writer).taskExecutor(taskExecutor)
.throttleLimit(getThrottleLimit()).build();
final Step partitionedStep = stepBuilderFactory
.get(this.getClass().getSimpleName() + " ReadStep:master")
.partitioner(readerStep)
.partitioner(
this.getClass().getSimpleName() + " ReadStep:slave",
partitioner()).taskExecutor(taskExecutor).build();
return partitionedStep;
}
/*
* #Bean public TaskExecutor taskExecutor() { return new
* SimpleAsyncTaskExecutor(); }
*/
#Bean
#JobScope
public Partitioner partitioner() {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
Resource[] resources;
try {
resources = resourcePatternResolver.getResources("file:"
+ getFilesPath() + FILE_PATTERN);
} catch (IOException e) {
throw new RuntimeException(
"I/O problems when resolving the input file pattern.", e);
}
partitioner.setResources(resources);
return partitioner;
}
#Bean
#JobScope
public JobExecutionListener listener(ListItemWriter<T> writer) {
return new JobCompletionNotificationListener<T>(writer);
}
/*
* Use this if you want the writer to have job scope (JIRA BATCH-2269). Also
* change the return type of writer to ListItemWriter for this to work.
*/
#Bean
public TaskExecutor taskExecutor() {
return new SimpleAsyncTaskExecutor() {
#Override
protected void doExecute(final Runnable task) {
// gets the jobExecution of the configuration thread
final JobExecution jobExecution = JobSynchronizationManager
.getContext().getJobExecution();
super.doExecute(new Runnable() {
public void run() {
JobSynchronizationManager.register(jobExecution);
try {
task.run();
} finally {
JobSynchronizationManager.close();
}
}
});
}
};
}
}
Let's say I have to read Invoice data for the sake of discussion. I can therefore extend the above class for creating an InvoiceLoader :
#Configuration
public class InvoiceLoader extends AbstractFileLoader<Invoice>{
private class InvoiceFieldSetMapper implements FieldSetMapper<Invoice> {
public Invoice mapFieldSet(FieldSet f) {
Invoice invoice = new Invoice();
invoice.setNo(f.readString("INVOICE_NO");
return e;
}
}
#Override
public FieldSetMapper<Invoice> getFieldSetMapper() {
return new InvoiceFieldSetMapper();
}
#Override
public String getFilesPath() {
return "I:/CK/invoices/partitions/";
}
#Override
public String[] getColumnNames() {
return new String[] { "INVOICE_NO", "DATE"};
}
#Override
#Bean(name="invoiceJob")
public Job loaderJob(Step s1,
JobExecutionListener listener) {
return createJob(s1, listener);
}
#Override
public int getChunkSize() {
return 25254;
}
#Override
public int getThrottleLimit() {
return 8;
}
}
Let's say I have one more class called Inventory that extends AbstractFileLoader.
On application startup, I can load these two annotation configurations as follows :
AbstractApplicationContext context1 = new AnnotationConfigApplicationContext(InvoiceLoader.class, InventoryLoader.class);
Somewhere else in my application two different threads can launch the jobs as follows :
Thread 1 :
JobLauncher jobLauncher1 = context1.getBean(JobLauncher.class);
Job job1 = context1.getBean("invoiceJob", Job.class);
JobExecution jobExecution = jobLauncher1.run(job1, jobParams1);
Thread 2 :
JobLauncher jobLauncher1 = context1.getBean(JobLauncher.class);
Job job1 = context1.getBean("inventoryJob", Job.class);
JobExecution jobExecution = jobLauncher1.run(job1, jobParams1);
The advantage of this approach is that everytime there is a new file to be read, all that the developer/user has to do is subclass AbstractFileLoader and implement the required abstract methods without the need to get into the details of how to assemble the job.
The questions :
I am new to Spring batch so I may have overlooked some of the not-so-obvious issues with this approach such as shared internal objects in Spring batch that may cause two jobs running together to fail or obvious issues such as scoping of the beans.
Is there a better way to achieve my objective?
The fileName attribute of the #Value("#{stepExecutionContext['fileName']}") is always being assigned the value as I:/CK/invoices/partitions/ which is the value returned by getPathmethod in InvoiceLoader even though the getPathmethod inInventoryLoader`returns a different value.
One option is passing them as job parameters. For instance:
#Bean
Job job() {
jobs.get("myJob").start(step1(null)).build()
}
#Bean
#JobScope
Step step1(#Value('#{jobParameters["commitInterval"]}') commitInterval) {
steps.get('step1')
.chunk((int) commitInterval)
.reader(new IterableItemReader(iterable: [1, 2, 3, 4], name: 'foo'))
.writer(writer(null))
.build()
}
#Bean
#JobScope
ItemWriter writer(#Value('#{jobParameters["writerClass"]}') writerClass) {
applicationContext.classLoader.loadClass(writerClass).newInstance()
}
With MyWriter:
class MyWriter implements ItemWriter<Integer> {
#Override
void write(List<? extends Integer> items) throws Exception {
println "Write $items"
}
}
Then executed with:
def jobExecution = launcher.run(ctx.getBean(Job), new JobParameters([
commitInterval: new JobParameter(3),
writerClass: new JobParameter('MyWriter'), ]))
Output is:
INFO: Executing step: [step1]
Write [1, 2, 3]
Write [4]
Feb 24, 2016 2:30:22 PM org.springframework.batch.core.launch.support.SimpleJobLauncher$1 run
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{commitInterval=3, writerClass=MyWriter}] and the following status: [COMPLETED]
Status is: COMPLETED, job execution id 0
#1 step1 COMPLETED
Full example here.

Resources