Spring Batch: FlatFileItemWriter to InputSteam/Byte array - spring-boot

So I've created a batch job which generates reports (csv files). I have been able to generate the the files seamlessly using FlatFileItemWriter but my end goal is to create an InputStream to call a rest service which will store the document or a byte array to store it in the database.
public class CustomWriter implements ItemWriter<Report> {
#Override
public void write(List<? extends Report> reports) throws Exception {
reports.forEach(this::writeDataToFile);
}
private void writeDataToFile(final Report data) throws Exception {
FlatFileItemWriter writer = new FlatFileItemWriter();
writer.setResource(new FileSystemResource("C:/reports/test-report.csv"));
writer.setLineAggregator(getLineAggregator();
writer.afterPropertiesSet();
writer.open(new ExecutionContext());
writer.write(data);
writer.close();
}
private DelimitedLineAggregator<Report> getLineAggregator(final Report report) {
DelimitedLineAggregator<Report> delimitedLineAgg = new DelimitedLineAggregator<Report>();
delimitedLineAgg.setDelimiter(",");
delimitedLineAgg.setFieldExtractor(getFieldExtractor());
return delimitedLineAgg;
}
private FieldExtractor<Report> getFieldExtractor() {
BeanWrapperFieldExtractor<Report> fieldExtractor = new BeanWrapperFieldExtractor<Report>();
fieldExtractor.setNames(COLUMN_HEADERS.toArray(new String[0]));
return fieldExtractor;
}
}
One way I could do this is to store the file locally temporarily and create a new step to pick the generated files up and do the sending/storing but I would really like to skip this step and send/store it in the first step.
How do I go about doing this?

Related

How to get current resource name

I am using MultiResourceItemReader in order to read and eventually write a list of CSV files to the database.
#StepScope
#Bean
public MultiResourceItemReader<DailyExport> multiResourceItemReader(#Value("#{stepExecutionContext[listNotLoadedFilesPath]}") List<String> notLoadedFilesPath) {
logger.info("** start multiResourceItemReader **");
// cast List of not loaded files to array of resources
List <Resource>tmpList = new ArrayList<Resource>();
notLoadedFilesPath.stream().forEach(fullPath -> {
Resource resource = new FileSystemResource(fullPath);
tmpList.add(resource);
});
Resource [] resourceArr = tmpList.toArray(new Resource[tmpList.size()]);
MultiResourceItemReader<DailyExport> multiResourceItemReader = new MultiResourceItemReader<>();
multiResourceItemReader.setName("dailyExportMultiReader");
multiResourceItemReader.setDelegate(reader(dailyExportMapper()));
multiResourceItemReader.setResources(resourceArr);
return multiResourceItemReader;
}
#Bean
public FlatFileItemReader<DailyExport> reader(FieldSetMapper<DailyExport> testClassRowMapper) {
logger.info("** start reader **");
// Create reader instance
FlatFileItemReader<DailyExport> reader = new FlatFileItemReaderBuilder<DailyExport>()
.name("dailyExportReader")
.linesToSkip(1).fieldSetMapper(testClassRowMapper)
.delimited().delimiter("|").names(dailyExportMetadata)
.build();
return reader;
}
Everything is working well but I also need to store the current file\resource name.
I found this API getCurrentResource but I couldn't figure how to use it. Is there a way to get the current resource during the process stage?
public class DailyExportItemProcessor implements ItemProcessor<DailyExport, DailyExport>{
#Autowired
public MultiResourceItemReader<DailyExport> multiResourceItemReader;
#Override
public DailyExport process(DailyExport item) throws Exception {
// multiResourceItemReader.getCurrent ??
return item;
}
Thank you
ResourceAware is what you need, it allows you set the original resource on the item so you can get access to it in the processor (or anywhere else where the item is in scope):
class DailyExport implement ResourceAware {
private Resource resource;
// getter/setter for resource
}
then in the processor:
public class DailyExportItemProcessor implements ItemProcessor<DailyExport, DailyExport>{
#Override
public DailyExport process(DailyExport item) throws Exception {
Resource currentResource = item.getResource();
// do something with the item/resource
return item;
}
}

FlatFileFooterCallback - how to get access to StepExecution For Count

I am reading from Oracle and writing to a CSV file. I have one step which reads and writes to the CSV file. I implemented a ChunkListener so I know how many records were written.
I want to be able to write a file trailer showing the number of records written to my file. I implemented FlatFileFooterCallback but cannot figure out how to get the data from StepExecution (the "readCount") to my FlatFileFooterCallback.
I guess I am struggling with how to get access to Job, Step scopes in my write.
Any examples, or links would be helpful. I am using [Spring Batch / Boot] so I am all annotated. I can find xml examples, so maybe this annotated stuff is more complicated.
ItemWriter<Object> databaseCsvItemWriter() {
FlatFileItemWriter<Object> csvFileWriter = new FlatFileItemWriter<>();
String exportFileHeader = "one,two,three";
StringHeaderWriter headerWriter = new StringHeaderWriter(exportFileHeader);
csvFileWriter.setHeaderCallback(headerWriter);
String exportFilePath = "/tmp/students.csv";
csvFileWriter.setResource(new FileSystemResource(exportFilePath));
LineAggregator<McsendRequest> lineAggregator = createRequestLineAggregator();
csvFileWriter.setLineAggregator(lineAggregator);
csvFileWriter.setFooterCallback(headerWriter);
return csvFileWriter;
}
You can implement CustomFooterCallback as follows:
public class CustomFooterCallback implements FlatFileFooterCallback {
#Value("#{StepExecution}")
private StepExecution stepExecution;
#Override
public void writeFooter(Writer writer) throws IOException {
writer.write("footer - number of items read: " + stepExecution.getReadCount());
writer.write("footer - number of items written: " + stepExecution.getWriteCount());
}
}
Then in a #Configuration class:
#Bean
#StepScope
public FlatFileFooterCallback customFooterCallback() {
return new CustomFooterCallback();
}
And use in the Writer:
csvFileWriter.setFooterCallback(customFooterCallback());
This way, you have access to StepExecution in order to read data as needed.

JobLaunchingGateway thread safety

I am trying to use Spring Integration with Spring batch.I am transforming my message Message to JobLaunchRequest
public class FileMessageToJobRequest {
private Job job;
private String fileParameterName;
public void setFileParameterName(String fileParameterName) {
this.fileParameterName = fileParameterName;
}
public void setJob(Job job) {
this.job = job;
}
public JobLaunchRequest toRequest(Message<File> message) throws IOException {
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
String fileName = message.getPayload().getName();
String currentDateTime = LocalDateTime.now().toString();
logger.info("currentDateTime"+currentDateTime);
jobParametersBuilder.addString(this.fileName, fileName);
jobParametersBuilder.addString(this.currentDateTime, currentDateTime);
return new JobLaunchRequest(job, jobParametersBuilder.toJobParameters());
}
}
here is my Integration flow
return IntegrationFlows.
from(Files.inboundAdapter(new File("landing"))
.preventDuplicates() .patternFilter("*.txt")
, e -> e.poller(Pollers.fixedDelay(2500) .maxMessagesPerPoll(15)
.taskExecutor(getFileProcessExecutor())))
.transform(fileMessageToJobRequest)
.handle(jobLaunchingGw(jobLauncher)).get();
#Bean
public MessageHandler jobLaunchingGw(JobLauncher jobLauncher)
{
return new JobLaunchingGateway(jobLauncher);
}
For some reason one of the files of the two that print the same time is not getting processed if two files print the same currentDateTime.The files are most likely to come at the same time.I am using ThreadPoolTaskExecutor corepool size,max pool size ,queue capacity is 15
String currentDateTime = LocalDateTime.now().toString();
the file is not getting processed.I am assuming this is some kind of race condition.I even tried synchronized on the JobLaunchRequest.Is this a race condition?Why would a message drop?
I am using Spring Integration 4.2.6 all latest versions(almost) of Spring.
My reference for this config is this question
Java DSL config

Spring batch : Assemble a job rather than configuring it (Extensible job configuration)

Background
I am working on designing a file reading layer that can read delimited files and load it in a List. I have decided to use Spring Batch because it provides a lot of scalability options which I can leverage for different sets of files depending on their size.
The requirement
I want to design a generic Job API that can be used to read any delimited file.
There should be a single Job structure that should be used for parsing every delimited file. For example, if the system needs to read 5 files, there will be 5 jobs (one for each file). The only way the 5 jobs will be different from each other is that they will use a different FieldSetMapper, column name, directory path and additional scaling parameters such as commit-interval and throttle-limit.
The user of this API should not need to configure a Spring
batch job, step, chunking, partitioning, etc on his own when a new file type is introduced in the system.
All that the user needs to do is to provide the FieldsetMapperto be used by the job along with the commit-interval, throttle-limit and the directory where each type of file will be placed.
There will be one predefined directory per file. Each directory can contain multiple files of the same type and format. A MultiResourcePartioner will be used to look inside a directory. The number of partitions = number of files in the directory.
My requirement is to build a Spring Batch infrastructure that gives me a unique job I can launch once I have the bits and pieces that will make up the job.
My solution :
I created an abstract configuration class that will be extended by concrete configuration classes (There will be 1 concrete class per file to be read).
#Configuration
#EnableBatchProcessing
public abstract class AbstractFileLoader<T> {
private static final String FILE_PATTERN = "*.dat";
#Autowired
JobBuilderFactory jobs;
#Autowired
ResourcePatternResolver resourcePatternResolver;
public final Job createJob(Step s1, JobExecutionListener listener) {
return jobs.get(this.getClass().getSimpleName())
.incrementer(new RunIdIncrementer()).listener(listener)
.start(s1).build();
}
public abstract Job loaderJob(Step s1, JobExecutionListener listener);
public abstract FieldSetMapper<T> getFieldSetMapper();
public abstract String getFilesPath();
public abstract String[] getColumnNames();
public abstract int getChunkSize();
public abstract int getThrottleLimit();
#Bean
#StepScope
#Value("#{stepExecutionContext['fileName']}")
public FlatFileItemReader<T> reader(String file) {
FlatFileItemReader<T> reader = new FlatFileItemReader<T>();
String path = file.substring(file.indexOf(":") + 1, file.length());
FileSystemResource resource = new FileSystemResource(path);
reader.setResource(resource);
DefaultLineMapper<T> lineMapper = new DefaultLineMapper<T>();
lineMapper.setFieldSetMapper(getFieldSetMapper());
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(",");
tokenizer.setNames(getColumnNames());
lineMapper.setLineTokenizer(tokenizer);
reader.setLineMapper(lineMapper);
reader.setLinesToSkip(1);
return reader;
}
#Bean
public ItemProcessor<T, T> processor() {
// TODO add transformations here
return null;
}
#Bean
#JobScope
public ListItemWriter<T> writer() {
ListItemWriter<T> writer = new ListItemWriter<T>();
return writer;
}
#Bean
#JobScope
public Step readStep(StepBuilderFactory stepBuilderFactory,
ItemReader<T> reader, ItemWriter<T> writer,
ItemProcessor<T, T> processor, TaskExecutor taskExecutor) {
final Step readerStep = stepBuilderFactory
.get(this.getClass().getSimpleName() + " ReadStep:slave")
.<T, T> chunk(getChunkSize()).reader(reader)
.processor(processor).writer(writer).taskExecutor(taskExecutor)
.throttleLimit(getThrottleLimit()).build();
final Step partitionedStep = stepBuilderFactory
.get(this.getClass().getSimpleName() + " ReadStep:master")
.partitioner(readerStep)
.partitioner(
this.getClass().getSimpleName() + " ReadStep:slave",
partitioner()).taskExecutor(taskExecutor).build();
return partitionedStep;
}
/*
* #Bean public TaskExecutor taskExecutor() { return new
* SimpleAsyncTaskExecutor(); }
*/
#Bean
#JobScope
public Partitioner partitioner() {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
Resource[] resources;
try {
resources = resourcePatternResolver.getResources("file:"
+ getFilesPath() + FILE_PATTERN);
} catch (IOException e) {
throw new RuntimeException(
"I/O problems when resolving the input file pattern.", e);
}
partitioner.setResources(resources);
return partitioner;
}
#Bean
#JobScope
public JobExecutionListener listener(ListItemWriter<T> writer) {
return new JobCompletionNotificationListener<T>(writer);
}
/*
* Use this if you want the writer to have job scope (JIRA BATCH-2269). Also
* change the return type of writer to ListItemWriter for this to work.
*/
#Bean
public TaskExecutor taskExecutor() {
return new SimpleAsyncTaskExecutor() {
#Override
protected void doExecute(final Runnable task) {
// gets the jobExecution of the configuration thread
final JobExecution jobExecution = JobSynchronizationManager
.getContext().getJobExecution();
super.doExecute(new Runnable() {
public void run() {
JobSynchronizationManager.register(jobExecution);
try {
task.run();
} finally {
JobSynchronizationManager.close();
}
}
});
}
};
}
}
Let's say I have to read Invoice data for the sake of discussion. I can therefore extend the above class for creating an InvoiceLoader :
#Configuration
public class InvoiceLoader extends AbstractFileLoader<Invoice>{
private class InvoiceFieldSetMapper implements FieldSetMapper<Invoice> {
public Invoice mapFieldSet(FieldSet f) {
Invoice invoice = new Invoice();
invoice.setNo(f.readString("INVOICE_NO");
return e;
}
}
#Override
public FieldSetMapper<Invoice> getFieldSetMapper() {
return new InvoiceFieldSetMapper();
}
#Override
public String getFilesPath() {
return "I:/CK/invoices/partitions/";
}
#Override
public String[] getColumnNames() {
return new String[] { "INVOICE_NO", "DATE"};
}
#Override
#Bean(name="invoiceJob")
public Job loaderJob(Step s1,
JobExecutionListener listener) {
return createJob(s1, listener);
}
#Override
public int getChunkSize() {
return 25254;
}
#Override
public int getThrottleLimit() {
return 8;
}
}
Let's say I have one more class called Inventory that extends AbstractFileLoader.
On application startup, I can load these two annotation configurations as follows :
AbstractApplicationContext context1 = new AnnotationConfigApplicationContext(InvoiceLoader.class, InventoryLoader.class);
Somewhere else in my application two different threads can launch the jobs as follows :
Thread 1 :
JobLauncher jobLauncher1 = context1.getBean(JobLauncher.class);
Job job1 = context1.getBean("invoiceJob", Job.class);
JobExecution jobExecution = jobLauncher1.run(job1, jobParams1);
Thread 2 :
JobLauncher jobLauncher1 = context1.getBean(JobLauncher.class);
Job job1 = context1.getBean("inventoryJob", Job.class);
JobExecution jobExecution = jobLauncher1.run(job1, jobParams1);
The advantage of this approach is that everytime there is a new file to be read, all that the developer/user has to do is subclass AbstractFileLoader and implement the required abstract methods without the need to get into the details of how to assemble the job.
The questions :
I am new to Spring batch so I may have overlooked some of the not-so-obvious issues with this approach such as shared internal objects in Spring batch that may cause two jobs running together to fail or obvious issues such as scoping of the beans.
Is there a better way to achieve my objective?
The fileName attribute of the #Value("#{stepExecutionContext['fileName']}") is always being assigned the value as I:/CK/invoices/partitions/ which is the value returned by getPathmethod in InvoiceLoader even though the getPathmethod inInventoryLoader`returns a different value.
One option is passing them as job parameters. For instance:
#Bean
Job job() {
jobs.get("myJob").start(step1(null)).build()
}
#Bean
#JobScope
Step step1(#Value('#{jobParameters["commitInterval"]}') commitInterval) {
steps.get('step1')
.chunk((int) commitInterval)
.reader(new IterableItemReader(iterable: [1, 2, 3, 4], name: 'foo'))
.writer(writer(null))
.build()
}
#Bean
#JobScope
ItemWriter writer(#Value('#{jobParameters["writerClass"]}') writerClass) {
applicationContext.classLoader.loadClass(writerClass).newInstance()
}
With MyWriter:
class MyWriter implements ItemWriter<Integer> {
#Override
void write(List<? extends Integer> items) throws Exception {
println "Write $items"
}
}
Then executed with:
def jobExecution = launcher.run(ctx.getBean(Job), new JobParameters([
commitInterval: new JobParameter(3),
writerClass: new JobParameter('MyWriter'), ]))
Output is:
INFO: Executing step: [step1]
Write [1, 2, 3]
Write [4]
Feb 24, 2016 2:30:22 PM org.springframework.batch.core.launch.support.SimpleJobLauncher$1 run
INFO: Job: [SimpleJob: [name=myJob]] completed with the following parameters: [{commitInterval=3, writerClass=MyWriter}] and the following status: [COMPLETED]
Status is: COMPLETED, job execution id 0
#1 step1 COMPLETED
Full example here.

Spring Batch reader file by file

I'm developing a Spring webapp, using spring boot and spring batch frameworks.
We have a set of complex & different json files, and we need to:
read each file
slightly modify its content
finally store them in mongodb.
The question: It makes sense to use spring batch for this task? As I can see in tutorials examples etc, spring batch is the right tool for line by line processing, but what about file by file?
I don't have problems with the writer (MongoItemWritter) and processer, but I do not see how to implement the reader.
Thanks!
yes you can definetly use Spring Batch.
The item for your Reader can be a File.
public class CustomItemReader implements InitializingBean{
private List<File> yourFiles= null;
public File read() {
if ((yourFiles!= null) && (yourFiles.size() != 0)) {
return yourFiles.remove(0);
}
return null;
}
//Reading Items from Service
private void reloadItems() {
this.yourItems= new ArrayList<File>();
// populate the items
}
#Override
public void afterPropertiesSet() throws Exception {
reloadItems();
}
}
A custom Processor :
public class MyProcessor implements ItemProcessor<File, File> {
#Override
public File process(File arg0) throws Exception {
// Apply any logic to your File before transferring it to the writer
return arg0;
}
}
And A custom Writer :
public class MyWriter{
public void write(File file) throws IOException {
}
}

Resources