Multiple input file Spring Batch - spring

I'm trying to develop a batch which can process a directory containing files with Spring Batch.
I looked at the MultiResourcePartitioner and tryied somethind like :
<job parent="loggerParent" id="importContractESTD" xmlns="http://www.springframework.org/schema/batch">
<step id="multiImportContractESTD">
<batch:partition step="partitionImportContractESTD" partitioner="partitioner">
<batch:handler grid-size="5" task-executor="taskExecutor" />
</batch:partition>
</step>
</job>
<bean id="partitioner" class="org.springframework.batch.core.partition.support.MultiResourcePartitioner">
<property name="keyName" value="inputfile" />
<property name="resources" value="file:${import.contract.filePattern}" />
</bean>
<step id="partitionImportContractESTD" xmlns="http://www.springframework.org/schema/batch">
<batch:job ref="importOneContractESTD" job-parameters-extractor="defaultJobParametersExtractor" />
</step>
<bean id="defaultJobParametersExtractor" class="org.springframework.batch.core.step.job.DefaultJobParametersExtractor"
scope="step" />
<!-- Job importContractESTD definition -->
<job parent="loggerParent" id="importOneContractESTD" xmlns="http://www.springframework.org/schema/batch">
<step parent="baseStep" id="initStep" next="calculateMD5">
<tasklet ref="initTasklet" />
</step>
<step id="calculateMD5" next="importContract">
<tasklet ref="md5Tasklet">
<batch:listeners>
<batch:listener ref="md5Tasklet" />
</batch:listeners>
</tasklet>
</step>
<step id="importContract">
<tasklet>
<chunk reader="contractReader" processor="contractProcessor" writer="contractWriter" commit-interval="${commit.interval}" />
<batch:listeners>
<batch:listener ref="contractProcessor" />
</batch:listeners>
</tasklet>
</step>
</job>
<!-- Chunk definition : Contract ItemReader -->
<bean id="contractReader" class="com.sopra.banking.cirbe.acquisition.batch.AcquisitionFileReader" scope="step">
<property name="resource" value="#{stepExecutionContext[inputfile]}" />
<property name="lineMapper">
<bean id="contractLineMappe" class="org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper">
<property name="tokenizers">
<map>
<entry key="1*" value-ref="headerTokenizer" />
<entry key="2*" value-ref="contractTokenizer" />
</map>
</property>
<property name="fieldSetMappers">
<map>
<entry key="1*" value-ref="headerMapper" />
<entry key="2*" value-ref="contractMapper" />
</map>
</property>
</bean>
</property>
</bean>
<!-- MD5 Tasklet -->
<bean id="md5Tasklet" class="com.sopra.banking.cirbe.acquisition.batch.AcquisitionMD5Tasklet">
<property name="file" value="#{stepExecutionContext[inputfile]}" />
</bean>
But what I get is :
Caused by: org.springframework.expression.spel.SpelEvaluationException: EL1008E:(pos 0): Field or property 'stepExecutionContext' cannot be found on object of type 'org.springframework.beans.factory.config.BeanExpressionContext'
What I'm looking for is a way to launch my job importOneContractESTD for each files contained in file:${import.contract.filePattern}. And each files is shared between the step calculateMD5 (which puts me the processed file md5 into my jobContext) and the step importContract (which read the previous md5 from the jobContext to add it as data to each line processed by the contractProcessor)
If I only try to call importOneContractESTD with one file given as a parameter (eg replacing #{stepExecutionContext[inputfile]} for ${my.file}), it works... But I want to try to use spring batch to manage my directory rather than my calling shell script...
Thanks for your ideas !

Add scope="step" when you need to access stepExecutionContext
like here:
<bean id="md5Tasklet" class="com.sopra.banking.cirbe.acquisition.batch.AcquisitionMD5Tasklet" scope="step">
<property name="file" value="#{stepExecutionContext[inputfile]}" />
</bean>
More info here.

Related

Spring Batch: is this a tasklet or chunk?

I'm a little bit confused!
Spring Batch provides two different ways for implementing a job: using tasklets and chunks.
So, when I have this:
<tasklet>
<chunk
reader = 'itemReader'
processor = 'itemProcessor'
writer = 'itemWriter'
/>
</tasklet>
What kind of implementation is this? Tasklet? Chunk?
That's a chunk type step, because inside the <tasklet> element is a <chunk> element that defines a reader, writer, and/or processor.
Below is an example of a job executing first a chunk and second a tasklet step:
<job id="readMultiFileJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1" next="deleteDir">
<tasklet>
<chunk reader="multiResourceReader" writer="flatFileItemWriter"
commit-interval="1" />
</tasklet>
</step>
<step id="deleteDir">
<tasklet ref="fileDeletingTasklet" />
</step>
</job>
<bean id="fileDeletingTasklet" class="com.mkyong.tasklet.FileDeletingTasklet" >
<property name="directory" value="file:csv/inputs/" />
</bean>
<bean id="multiResourceReader"
class=" org.springframework.batch.item.file.MultiResourceItemReader">
<property name="resources" value="file:csv/inputs/domain-*.csv" />
<property name="delegate" ref="flatFileItemReader" />
</bean>
Thus you can see that the distinction is actually on the level of steps, not for the entire job.

Less number of threads are running parallel - Spring Batch Remote Partitioning

I am working on a Spring Batch project where I have a file of 2 million records. I am doing some processing on it and then saving it to database. Processing is time costly. So I am using Spring Batch Remote partitioning.
First I am manually splitting the file into 15 files and then using multiResourcePartitioner I am assigning each file to a single thread. But what I noticed is that in the start only 4 threads are running parallel and after some time number of threads running parallel are decreasing with time.
This is the configuration:
<batch:job id="GhanshyamESCatalogUpdater">
<batch:step id="GhanshyamCatalogUpdater2" >
<batch:partition step="slave" partitioner="rangePartitioner">
<batch:handler grid-size="15" task-executor="taskExecutor" />
</batch:partition>
</batch:step>
<batch:listeners>
<batch:listener ref="jobFailureListener"/>
</batch:listeners>
</batch:job>
<bean id="rangePartitioner" class="org.springframework.batch.core.partition.support.MultiResourcePartitioner" scope="step">
<property name="resources" value="file:#{jobParameters['job.partitionDir']}/x*">
</property>
</bean>
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
<tasklet>
<chunk reader="gsbmyntraXmlReader" writer="gsbmyntraESWriter" commit-interval="1000" />
</tasklet>
</step>
This is the Task Executor:
<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="100" />
<property name="allowCoreThreadTimeOut" value="true" />
<property name="WaitForTasksToCompleteOnShutdown" value="true" />
</bean>

Get file name from readMultiFileJob in Spring Batch

The following is my Spring Batch processing config file, i am reading multiple files (xml, csv etc), the files generate dynamically with time stamp as suffix i can read file's data and process, now the Question is,
i would like know the file name.
How to get file name when job is processing.
<import resource="../config/context.xml" />
<bean id="domain" class="com.di.pos.Domain" />
<job id="readMultiFileJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1">
<tasklet>
<chunk reader="multiResourceReader" writer="flatFileItemWriter"
commit-interval="1" />
</tasklet>
</step>
</job>
<bean id="multiResourceReader"
class=" org.springframework.batch.item.file.MultiResourceItemReader">
<property name="resources" value="file:csv/inputs/dipos-*.csv" />
<property name="delegate" ref="flatFileItemReader" />
</bean>
<bean id="flatFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="id, name" />
</bean>
</property>
<property name="fieldSetMapper">
<bean
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="domain" />
</bean>
</property>
</bean>
</property>
</bean>
Create a custom Mapper which extends LineMapper.
Override mapLine() method
public FileData mapLine(String line, int lineNumber) throws Exception {
FileData fileData = new FileData();
Resource currentResource = delegator.getCurrentResource();
String[] fileName = currentResource.getFilename().split("/");
//Use this to access file path
URI fileUri = currentResource.getURI();
return fileData;
}

How to set property using "tasklet ref" tag

I have a tasklet ValidarSituacaoTasklet that has an property situacao. This tasklet is used in 2 steps in distinct values for situacao. I declared steps as like:
and the bean:
<bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
</bean>
I have to pass 'situacao' to tasklet .
I tried:
<step id="validaSituacaoStep">
<tasklet ref="validarSituacaoTasklet ">
<property name="situacao" value="EM_FECHAMENTO"/>
</tasklet>
</step>
but it does not seem to be the right way to do it.
Isn't this what you want:
<step id="validaSituacaoStep">
<tasklet ref="validarSituacaoTasklet "/>
</step>
<bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
<property name="situacao" value="EM_FECHAMENTO"/>
</bean>
UPDATE
Based on the comment left, this should work:
<step id="validaSituacaoStep">
<tasklet>
<bean class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
<property name="situacao" value="EM_FECHAMENTO"/>
</bean>
<tasklet>
</step>
Have you tried the following ?
<bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
<property name="situacao" ref="daoBean"/>
</bean>
The DAO should be referenced at your bean's definition

Spring batch - Disappearing threads While using Partioner

I am using Spring batch to process voluminous data daily. So we are ready to go with Spring batch Partioning concept.
Below is my configuration :`
<job id="test" xmlns="http://www.springframework.org/schema/batch">
<step id="masterStep">
<partition step="step2" partitioner="multiPartioner">
<handler grid-size="3" task-executor="taskExecutor" />
</partition>
</step>
</job>
<bean id="multiPartioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
scope="step">
<property name="resources" value="file:#{jobParameters[fileDirectory]}/*" />
</bean>
<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="10" />
<property name="maxPoolSize" value="10" />
</bean>
<step id="step2" xmlns="http://www.springframework.org/schema/batch">
<tasklet transaction-manager="transactionManager">
<chunk reader="multiResourceItemReader" writer="testWriter"
commit-interval="20000">
</chunk>
</tasklet>
</step>
When I try to specify the corePoolSize as 4 , it is working fine without any issues. But if I increase the count of corePoolSize to 10 , it is executing but after some time say 20 mins nothing was executed . No Logs or no error . No status about what is happening . It was idle no execution further.
Please help me to resolve this issue.

Resources