Spring Batch: is this a tasklet or chunk? - spring

I'm a little bit confused!
Spring Batch provides two different ways for implementing a job: using tasklets and chunks.
So, when I have this:
<tasklet>
<chunk
reader = 'itemReader'
processor = 'itemProcessor'
writer = 'itemWriter'
/>
</tasklet>
What kind of implementation is this? Tasklet? Chunk?

That's a chunk type step, because inside the <tasklet> element is a <chunk> element that defines a reader, writer, and/or processor.
Below is an example of a job executing first a chunk and second a tasklet step:
<job id="readMultiFileJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1" next="deleteDir">
<tasklet>
<chunk reader="multiResourceReader" writer="flatFileItemWriter"
commit-interval="1" />
</tasklet>
</step>
<step id="deleteDir">
<tasklet ref="fileDeletingTasklet" />
</step>
</job>
<bean id="fileDeletingTasklet" class="com.mkyong.tasklet.FileDeletingTasklet" >
<property name="directory" value="file:csv/inputs/" />
</bean>
<bean id="multiResourceReader"
class=" org.springframework.batch.item.file.MultiResourceItemReader">
<property name="resources" value="file:csv/inputs/domain-*.csv" />
<property name="delegate" ref="flatFileItemReader" />
</bean>
Thus you can see that the distinction is actually on the level of steps, not for the entire job.

Related

Less number of threads are running parallel - Spring Batch Remote Partitioning

I am working on a Spring Batch project where I have a file of 2 million records. I am doing some processing on it and then saving it to database. Processing is time costly. So I am using Spring Batch Remote partitioning.
First I am manually splitting the file into 15 files and then using multiResourcePartitioner I am assigning each file to a single thread. But what I noticed is that in the start only 4 threads are running parallel and after some time number of threads running parallel are decreasing with time.
This is the configuration:
<batch:job id="GhanshyamESCatalogUpdater">
<batch:step id="GhanshyamCatalogUpdater2" >
<batch:partition step="slave" partitioner="rangePartitioner">
<batch:handler grid-size="15" task-executor="taskExecutor" />
</batch:partition>
</batch:step>
<batch:listeners>
<batch:listener ref="jobFailureListener"/>
</batch:listeners>
</batch:job>
<bean id="rangePartitioner" class="org.springframework.batch.core.partition.support.MultiResourcePartitioner" scope="step">
<property name="resources" value="file:#{jobParameters['job.partitionDir']}/x*">
</property>
</bean>
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
<tasklet>
<chunk reader="gsbmyntraXmlReader" writer="gsbmyntraESWriter" commit-interval="1000" />
</tasklet>
</step>
This is the Task Executor:
<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="100" />
<property name="allowCoreThreadTimeOut" value="true" />
<property name="WaitForTasksToCompleteOnShutdown" value="true" />
</bean>

How to set property using "tasklet ref" tag

I have a tasklet ValidarSituacaoTasklet that has an property situacao. This tasklet is used in 2 steps in distinct values for situacao. I declared steps as like:
and the bean:
<bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
</bean>
I have to pass 'situacao' to tasklet .
I tried:
<step id="validaSituacaoStep">
<tasklet ref="validarSituacaoTasklet ">
<property name="situacao" value="EM_FECHAMENTO"/>
</tasklet>
</step>
but it does not seem to be the right way to do it.
Isn't this what you want:
<step id="validaSituacaoStep">
<tasklet ref="validarSituacaoTasklet "/>
</step>
<bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
<property name="situacao" value="EM_FECHAMENTO"/>
</bean>
UPDATE
Based on the comment left, this should work:
<step id="validaSituacaoStep">
<tasklet>
<bean class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
<property name="situacao" value="EM_FECHAMENTO"/>
</bean>
<tasklet>
</step>
Have you tried the following ?
<bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
<property name="situacao" ref="daoBean"/>
</bean>
The DAO should be referenced at your bean's definition

Spring batch - Disappearing threads While using Partioner

I am using Spring batch to process voluminous data daily. So we are ready to go with Spring batch Partioning concept.
Below is my configuration :`
<job id="test" xmlns="http://www.springframework.org/schema/batch">
<step id="masterStep">
<partition step="step2" partitioner="multiPartioner">
<handler grid-size="3" task-executor="taskExecutor" />
</partition>
</step>
</job>
<bean id="multiPartioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
scope="step">
<property name="resources" value="file:#{jobParameters[fileDirectory]}/*" />
</bean>
<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="10" />
<property name="maxPoolSize" value="10" />
</bean>
<step id="step2" xmlns="http://www.springframework.org/schema/batch">
<tasklet transaction-manager="transactionManager">
<chunk reader="multiResourceItemReader" writer="testWriter"
commit-interval="20000">
</chunk>
</tasklet>
</step>
When I try to specify the corePoolSize as 4 , it is working fine without any issues. But if I increase the count of corePoolSize to 10 , it is executing but after some time say 20 mins nothing was executed . No Logs or no error . No status about what is happening . It was idle no execution further.
Please help me to resolve this issue.

Can we write a Spring Batch Job Without ItemReader and ItemWriter

In my project, I have written a Quartz scheduler with Spring Batch 2.2.
As per my requirement, I want to run a scheduler to fetch application config property to refresh the configuration cache on all the GlassFish Clusters.
So I dont need ItemWriter and ItemReader which are used to for File Read/Write operations.
So can I remove ItemReader and ItemWriter from ?
The configuration of my job is mentioned below :
<batch:job id="reportJob">
<batch:step id="step1">
<batch:tasklet>
<!--I want to remove ItemReader and ItemWriter as its not used -->
<batch:chunk reader="ItemReader" writer="ItemWriter"
commit-interval="10">
</batch:chunk>
</batch:tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="simpleListener"/>
</batch:listeners>
</batch:job>
<bean id="jobDetail" class="org.springframework.scheduling.quartz.JobDetailBean">
<!-- Cache Refresh code is written here : JobLauncherDetails.java file -->
<property name="jobClass" value="com.mkyong.quartz.JobLauncherDetails" />
<property name="group" value="quartz-batch" />
<property name="jobDataAsMap">
<map>
<entry key="jobName" value="reportJob" />
<entry key="jobLocator" value-ref="jobRegistry" />
<entry key="jobLauncher" value-ref="jobLauncher" />
<entry key="param1" value="mkyong1" />
<entry key="param2" value="mkyong2" />
</map>
</property>
</bean>
I writing my business logic to refresh cache on JobClass JobLauncherDetails.java.
So is it possible to remove ItemReader and ItemWriter ? Do we have any possible alternative way ?
Use a Tasklet
<job id="reportJob">
<step id="step1">
<tasklet ref="MyTaskletBean" />
</step>
<!-- Other config... -->
</job>
class MyTasklet implements Tasklet {
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
}
}
You can read more on Tasklet at chapter 5.2 from official doc

Multiple input file Spring Batch

I'm trying to develop a batch which can process a directory containing files with Spring Batch.
I looked at the MultiResourcePartitioner and tryied somethind like :
<job parent="loggerParent" id="importContractESTD" xmlns="http://www.springframework.org/schema/batch">
<step id="multiImportContractESTD">
<batch:partition step="partitionImportContractESTD" partitioner="partitioner">
<batch:handler grid-size="5" task-executor="taskExecutor" />
</batch:partition>
</step>
</job>
<bean id="partitioner" class="org.springframework.batch.core.partition.support.MultiResourcePartitioner">
<property name="keyName" value="inputfile" />
<property name="resources" value="file:${import.contract.filePattern}" />
</bean>
<step id="partitionImportContractESTD" xmlns="http://www.springframework.org/schema/batch">
<batch:job ref="importOneContractESTD" job-parameters-extractor="defaultJobParametersExtractor" />
</step>
<bean id="defaultJobParametersExtractor" class="org.springframework.batch.core.step.job.DefaultJobParametersExtractor"
scope="step" />
<!-- Job importContractESTD definition -->
<job parent="loggerParent" id="importOneContractESTD" xmlns="http://www.springframework.org/schema/batch">
<step parent="baseStep" id="initStep" next="calculateMD5">
<tasklet ref="initTasklet" />
</step>
<step id="calculateMD5" next="importContract">
<tasklet ref="md5Tasklet">
<batch:listeners>
<batch:listener ref="md5Tasklet" />
</batch:listeners>
</tasklet>
</step>
<step id="importContract">
<tasklet>
<chunk reader="contractReader" processor="contractProcessor" writer="contractWriter" commit-interval="${commit.interval}" />
<batch:listeners>
<batch:listener ref="contractProcessor" />
</batch:listeners>
</tasklet>
</step>
</job>
<!-- Chunk definition : Contract ItemReader -->
<bean id="contractReader" class="com.sopra.banking.cirbe.acquisition.batch.AcquisitionFileReader" scope="step">
<property name="resource" value="#{stepExecutionContext[inputfile]}" />
<property name="lineMapper">
<bean id="contractLineMappe" class="org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper">
<property name="tokenizers">
<map>
<entry key="1*" value-ref="headerTokenizer" />
<entry key="2*" value-ref="contractTokenizer" />
</map>
</property>
<property name="fieldSetMappers">
<map>
<entry key="1*" value-ref="headerMapper" />
<entry key="2*" value-ref="contractMapper" />
</map>
</property>
</bean>
</property>
</bean>
<!-- MD5 Tasklet -->
<bean id="md5Tasklet" class="com.sopra.banking.cirbe.acquisition.batch.AcquisitionMD5Tasklet">
<property name="file" value="#{stepExecutionContext[inputfile]}" />
</bean>
But what I get is :
Caused by: org.springframework.expression.spel.SpelEvaluationException: EL1008E:(pos 0): Field or property 'stepExecutionContext' cannot be found on object of type 'org.springframework.beans.factory.config.BeanExpressionContext'
What I'm looking for is a way to launch my job importOneContractESTD for each files contained in file:${import.contract.filePattern}. And each files is shared between the step calculateMD5 (which puts me the processed file md5 into my jobContext) and the step importContract (which read the previous md5 from the jobContext to add it as data to each line processed by the contractProcessor)
If I only try to call importOneContractESTD with one file given as a parameter (eg replacing #{stepExecutionContext[inputfile]} for ${my.file}), it works... But I want to try to use spring batch to manage my directory rather than my calling shell script...
Thanks for your ideas !
Add scope="step" when you need to access stepExecutionContext
like here:
<bean id="md5Tasklet" class="com.sopra.banking.cirbe.acquisition.batch.AcquisitionMD5Tasklet" scope="step">
<property name="file" value="#{stepExecutionContext[inputfile]}" />
</bean>
More info here.

Resources