spring batch flatfileitemwriter for multifile of large data - spring

i use chunk for write files. i have two tables files and datas
config.java
public ListItemReader<> reader(String fileName) {
listItemReader = selectDataOfFileFromDB(fileName);
....
return listItemReader;
}
public FlatFileItemWriter<> writer(fileName) {
FlatFileItemWriter<> delegate = new FlatFileItemWriterBuilder<>
.name(fileName+XXX)
.resource(new FileSystemResource("/xxx/xxx/xxx/"+ fileName)).build();
return delegate;
}
public Step xxxxStep(fileName) {
return stepBuilderFactory.get("xxxxstep" + XXXX)
.reader(reader(fileName))
.writer(writer(fileName)).build();
}
#Bean
public Job xxxJob() {
List<fileName> list = selectFileNameFromDB();
JobBuilder xx = jobBuilderFactory.get("XXXXjob");
SimpleJobBuilder a = null;
a = xx.start(xxxxStep(list.get(0)));
a.next(xxxxStep(list.get(1)))
a.next(xxxxStep(list.get(2))
a.next(xxxxStep(list.get(3))
.....
a.next(xxxxStep(list.get(n))
}
I can write data to each of file but it not smart. any other solution is?
I try the classifiercompositeitemwriter but not suitable!

Related

How to get current resource name

I am using MultiResourceItemReader in order to read and eventually write a list of CSV files to the database.
#StepScope
#Bean
public MultiResourceItemReader<DailyExport> multiResourceItemReader(#Value("#{stepExecutionContext[listNotLoadedFilesPath]}") List<String> notLoadedFilesPath) {
logger.info("** start multiResourceItemReader **");
// cast List of not loaded files to array of resources
List <Resource>tmpList = new ArrayList<Resource>();
notLoadedFilesPath.stream().forEach(fullPath -> {
Resource resource = new FileSystemResource(fullPath);
tmpList.add(resource);
});
Resource [] resourceArr = tmpList.toArray(new Resource[tmpList.size()]);
MultiResourceItemReader<DailyExport> multiResourceItemReader = new MultiResourceItemReader<>();
multiResourceItemReader.setName("dailyExportMultiReader");
multiResourceItemReader.setDelegate(reader(dailyExportMapper()));
multiResourceItemReader.setResources(resourceArr);
return multiResourceItemReader;
}
#Bean
public FlatFileItemReader<DailyExport> reader(FieldSetMapper<DailyExport> testClassRowMapper) {
logger.info("** start reader **");
// Create reader instance
FlatFileItemReader<DailyExport> reader = new FlatFileItemReaderBuilder<DailyExport>()
.name("dailyExportReader")
.linesToSkip(1).fieldSetMapper(testClassRowMapper)
.delimited().delimiter("|").names(dailyExportMetadata)
.build();
return reader;
}
Everything is working well but I also need to store the current file\resource name.
I found this API getCurrentResource but I couldn't figure how to use it. Is there a way to get the current resource during the process stage?
public class DailyExportItemProcessor implements ItemProcessor<DailyExport, DailyExport>{
#Autowired
public MultiResourceItemReader<DailyExport> multiResourceItemReader;
#Override
public DailyExport process(DailyExport item) throws Exception {
// multiResourceItemReader.getCurrent ??
return item;
}
Thank you
ResourceAware is what you need, it allows you set the original resource on the item so you can get access to it in the processor (or anywhere else where the item is in scope):
class DailyExport implement ResourceAware {
private Resource resource;
// getter/setter for resource
}
then in the processor:
public class DailyExportItemProcessor implements ItemProcessor<DailyExport, DailyExport>{
#Override
public DailyExport process(DailyExport item) throws Exception {
Resource currentResource = item.getResource();
// do something with the item/resource
return item;
}
}

Spring Integration File synchronizer : Accept files based on a pre defined list

I am transfering files from a remote to a local by Sftp, for processing.
I want to only transfer .csv files, and I have a list of pre-defined filenames.
I couldn't find a FileListFilter that allows to specify multiple patterns and transfer if at least one if matched.
So far I have this code, that's woorking for ".csv" filtering.
The Integration Flow
#Bean
public IntegrationFlow integFlow() {
return IntegrationFlows
.from(ftpMessageSource(), c -> poller())
... more processing
The MessageSource
public MessageSource<File> ftpMessageSource() {
SftpInboundFileSynchronizer fileSynchronizer = new SftpInboundFileSynchronizer(sessionFactory);
fileSynchronizer.setRemoteDirectory(remoteDirectory);
fileSynchronizer.setDeleteRemoteFiles(true);
fileSynchronizer.setFilter(new SftpRegexPatternFileListFilter(Constantes.EXTENSION));
SftpInboundFileSynchronizingMessageSource ftpInboundFileSync =
new SftpInboundFileSynchronizingMessageSource(fileSynchronizer);
ftpInboundFileSync.setLocalDirectory(new File(workDirectory));
ftpInboundFileSync.setAutoCreateLocalDirectory(true);
CompositeFileListFilter<File> compositeFileListFilter = new CompositeFileListFilter<>();
compositeFileListFilter.addFilter(new RegexPatternFileListFilter(Constantes.EXTENSION));
ftpInboundFileSync.setLocalFilter(compositeFileListFilter);
return ftpInboundFileSync;
}
Constantes.EXTENSION is a regex accepting .csv and .CSV. This works fine.
Say that I have a String list that contains "string1',"string2","string3" and I want to transfer every file of the form string1*, string2* or string3*. How would I proceed ?
#SpringBootApplication
public class So59161698Application {
public static void main(String[] args) {
SpringApplication.run(So59161698Application.class, args);
}
private final String myPatterns = "foo,bar,baz";
#Bean
public FileListFilter<File> filter() {
Set<String> patterns = StringUtils.commaDelimitedListToSet(this.myPatterns);
return files -> Arrays.stream(files)
.filter(file -> patterns.stream()
.filter(pattern -> file.getName().startsWith(pattern))
.findFirst()
.isPresent())
.collect(Collectors.toList());
}
#Bean
public ApplicationRunner runner(FileListFilter<File> filter) {
return args -> {
System.out.println(filter.filterFiles(new File[] {
new File("foo.csv"),
new File("bar.csv"),
new File("baz.csv"),
new File("qux.csv")
}));
};
}
}
[foo.csv, bar.csv, baz.csv]
There is a CompositeFileListFilter:
* Simple {#link FileListFilter} that predicates its matches against <b>all</b> of the
* configured {#link FileListFilter}.
With the logic like:
public boolean accept(F file) {
AtomicBoolean allAccept = new AtomicBoolean(true);
// we can't use stream().allMatch() because we have to call all filters for this filter's contract
this.fileFilters.forEach(f -> allAccept.compareAndSet(true, f.accept(file)));
return allAccept.get();
}
So, you configure this CompositeFileListFilter with several SftpRegexPatternFileListFilter delegates and your files are going to be processed whenever they much at least one of the filters in the CompositeFileListFilter.
See more about filters in the docs: https://docs.spring.io/spring-integration/docs/current/reference/html/file.html#file-reading

Spring batch patitioning of db not working properly

I have configured a job as follow, which is to read from db and write into files but by partitioning data on basis of sequence.
//Job Config
#Bean
public Job job(JobBuilderFactory jobBuilderFactory) throws Exception {
Flow masterFlow1 = (Flow) new FlowBuilder<Object>("masterFlow1").start(masterStep()).build();
return (jobBuilderFactory.get("Partition-Job")
.incrementer(new RunIdIncrementer())
.start(masterFlow1)
.build()).build();
}
#Bean
public Step masterStep() throws Exception
{
return stepBuilderFactory.get(MASTERPPREPAREDATA)
//.listener(customSEL)
.partitioner(STEPPREPAREDATA,new DBPartitioner())
.step(prepareDataForS1())
.gridSize(gridSize)
.taskExecutor(new SimpleAsyncTaskExecutor("Thread"))
.build();
}
#Bean
public Step prepareDataForS1() throws Exception
{
return stepBuilderFactory.get(STEPPREPAREDATA)
//.listener(customSEL)
.<InputData,InputData>chunk(chunkSize)
.reader(JDBCItemReader(0,0))
.writer(writer(null))
.build();
}
#Bean(destroyMethod="")
#StepScope
public JdbcCursorItemReader<InputData> JDBCItemReader(#Value("#{stepExecutionContext[startingIndex]}") int startingIndex,
#Value("#{stepExecutionContext[endingIndex]}") int endingIndex)
{
JdbcCursorItemReader<InputData> ir = new JdbcCursorItemReader<>();
ir.setDataSource(batchDataSource);
ir.setMaxItemCount(DBPartitioner.partitionSize);
ir.setSaveState(false);
ir.setRowMapper(new InputDataRowMapper());
ir.setSql("SELECT * FROM FIF_INPUT fi WHERE fi.SEQ > ? AND fi.SEQ < ?");
ir.setPreparedStatementSetter(new PreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps) throws SQLException {
ps.setInt(1, startingIndex);
ps.setInt(2, endingIndex);
}
});
return ir;
}
#Bean
#StepScope
public FlatFileItemWriter<InputData> writer(#Value("#{stepExecutionContext[index]}") String index)
{
System.out.println("writer initialized!!!!!!!!!!!!!"+index);
//Create writer instance
FlatFileItemWriter<InputData> writer = new FlatFileItemWriter<>();
//Set output file location
writer.setResource(new FileSystemResource(batchDirectory+relativeInputDirectory+index+inputFileForS1));
//All job repetitions should "append" to same output file
writer.setAppendAllowed(false);
//Name field values sequence based on object properties
writer.setLineAggregator(customLineAggregator);
return writer;
}
Partitioner provided for partitioning db is written separately in other file so as follows
//PartitionDb.java
public class DBPartitioner implements Partitioner{
public static int partitionSize;
private static Log log = LogFactory.getLog(DBPartitioner.class);
#SuppressWarnings("unchecked")
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
log.debug("START: Partition"+"grid size:"+gridSize);
#SuppressWarnings("rawtypes")
Map partitionMap = new HashMap<>();
int startingIndex = -1;
int endSize = partitionSize+1;
for(int i=0; i< gridSize; i++){
ExecutionContext ctxMap = new ExecutionContext();
ctxMap.putInt("startingIndex",startingIndex);
ctxMap.putInt("endingIndex", endSize);
ctxMap.put("index", i);
startingIndex = endSize-1;
endSize += partitionSize;
partitionMap.put("Thread:-"+i, ctxMap);
}
log.debug("END: Created Partitions of size: "+ partitionMap.size());
return partitionMap;
}
}
This one is executing properly but problem is even after partitioning on the basis of sequence i am getting same rows in multiple files which is not right as i am providing different set of data for each partition. Can anyone tell me whats wrong. I am using HikariCP for Db connection pooling and spring batch 4
This one is executing properly but problem is even after partitioning on the basis of sequence i am getting same rows in multiple files which is not right as i am providing different set of data for each partition.
I'm not sure your partitioner is working properly. A quick test shows that it is not providing different sets of data as you are claiming:
DBPartitioner dbPartitioner = new DBPartitioner();
Map<String, ExecutionContext> partition = dbPartitioner.partition(5);
for (String s : partition.keySet()) {
System.out.println(s + " : " + partition.get(s));
}
This prints:
Thread:-0 : {endingIndex=1, index=0, startingIndex=-1}
Thread:-1 : {endingIndex=1, index=1, startingIndex=0}
Thread:-2 : {endingIndex=1, index=2, startingIndex=0}
Thread:-3 : {endingIndex=1, index=3, startingIndex=0}
Thread:-4 : {endingIndex=1, index=4, startingIndex=0}
As you can see, almost all partitions will have the same startingIndex and endingIndex.
I recommend you unit test your partitioner before using it in a partitioned step.

Load Multiple CSV files into database using Spring Batch

I want to load multiple CSV files into mysql database at single table using
Spring Batch. The path of the files are derived from the following method.
public List<String> getFilePath() {
String inputPath = "E:\\input";
List<String> inputCSVPaths = new ArrayList<String>();
Map<String, List<String>> inputInfo = new HashMap<String, List<String>>();
File inputFolder = new File(inputPath);
File[] inputFiles = inputFolder.listFiles();
for (File file : inputFiles) {
inputCSVPaths.add(file.getAbsolutePath());
}
inputInfo.put("Introduction", inputCSVPaths);
List<String> inputFile = inputInfo.get("Introduction");
System.out.println("Input File :"+inputFile);
return inputFile;
}
There are total 3 CSV files. But it reads only onle file and inserts data of only that CSV file. Is there wrong in getting resources.
#Autowired
private FilePathDemo filePathDemo;
#Bean
public MultiResourceItemReader<Introduction> multiResourceItemReader() throws IOException {
MultiResourceItemReader<Introduction> multiReader = new MultiResourceItemReader<Introduction>();
ResourcePatternResolver patternResolver = new PathMatchingResourcePatternResolver();
Resource[] resources;
String filePath = "file:";
List<String> path = filePathDemo.getFilePath();
for (String introPath : path) {
System.out.println("File Path of the Introduction CSV :" + introPath);
resources = patternResolver.getResources(filePath + introPath);
multiReader.setResources(resources);
}
FlatFileItemReader<Introduction> flatReader = new FlatFileItemReader<Introduction>();
multiReader.setDelegate(flatReader);
flatReader.setLinesToSkip(1);
flatReader.setLineMapper(new DefaultLineMapper<Introduction>() {
{
setLineTokenizer(new DelimitedLineTokenizer() {
{
setNames(new String[] { "id", "name", "age", "phoneNo"});
}
});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Introduction>() {
{
setTargetType(Introduction.class);
}
});
}
});
flatReader.close();
multiReader.close();
return multiReader;
}
There are two issues with your configuration:
You are reassigning the resources array with a single file in the for loop. Hence, the MultiResourceItemReader will be configured with only one file.
You are calling the close method on the MultiResourceItemReader and the delegate FlatFileItemReader but you should not. Spring Batch will call those methods when the step is complete.
You can find an example of how to configure the MultiResourceItemReader here: https://docs.spring.io/spring-batch/4.0.x/reference/html/readersAndWriters.html#multiFileInput

Spring batch chunk size

I am new in spring batch and I am stuck in something basic I think.
I create a Job configuration like this:
//reader
#Bean
public ItemReader<UnprocessedTrek> atReader() {
//AnalyzeTrekItemReader reader = new AnalyzeTrekItemReader();
JdbcCursorItemReader<UnprocessedTrek> reader = new JdbcCursorItemReader<UnprocessedTrek>();
reader.setSql("SELECT * FROM " + UnprocessedTrek.TBL_NAME);
reader.setRowMapper(new UnprocessedTrekRowMapper());
reader.setDataSource(rntDataSource);
reader.setFetchSize(0);
return reader;
}
//processor
#Bean
public ItemProcessor<UnprocessedTrek, Document> atProcessor()
{
AnalyzeTrekItemProcessor processor = new AnalyzeTrekItemProcessor();
return processor;
}
//writer
#Bean
public ItemWriter<Document> atWriter()
{
AnalyzeTrekItemWriter writer = new AnalyzeTrekItemWriter();
return writer;
}
#Bean
public Step analyzeTrek()
{
return steps.get("analyzeTrek")
.<UnprocessedTrek, Document> chunk(50)
.reader(atReader())
.processor(atProcessor())
.writer(atWriter())
.build();
}
My problem is that when the size of items processed is inferior as 50 the writer is not called. What I am missing in my configuration ?
Thanks for your help.

Resources