Spring batch - Disappearing threads While using Partioner - spring

I am using Spring batch to process voluminous data daily. So we are ready to go with Spring batch Partioning concept.
Below is my configuration :`
<job id="test" xmlns="http://www.springframework.org/schema/batch">
<step id="masterStep">
<partition step="step2" partitioner="multiPartioner">
<handler grid-size="3" task-executor="taskExecutor" />
</partition>
</step>
</job>
<bean id="multiPartioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
scope="step">
<property name="resources" value="file:#{jobParameters[fileDirectory]}/*" />
</bean>
<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="10" />
<property name="maxPoolSize" value="10" />
</bean>
<step id="step2" xmlns="http://www.springframework.org/schema/batch">
<tasklet transaction-manager="transactionManager">
<chunk reader="multiResourceItemReader" writer="testWriter"
commit-interval="20000">
</chunk>
</tasklet>
</step>
When I try to specify the corePoolSize as 4 , it is working fine without any issues. But if I increase the count of corePoolSize to 10 , it is executing but after some time say 20 mins nothing was executed . No Logs or no error . No status about what is happening . It was idle no execution further.
Please help me to resolve this issue.

Related

Spring Batch: is this a tasklet or chunk?

I'm a little bit confused!
Spring Batch provides two different ways for implementing a job: using tasklets and chunks.
So, when I have this:
<tasklet>
<chunk
reader = 'itemReader'
processor = 'itemProcessor'
writer = 'itemWriter'
/>
</tasklet>
What kind of implementation is this? Tasklet? Chunk?
That's a chunk type step, because inside the <tasklet> element is a <chunk> element that defines a reader, writer, and/or processor.
Below is an example of a job executing first a chunk and second a tasklet step:
<job id="readMultiFileJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1" next="deleteDir">
<tasklet>
<chunk reader="multiResourceReader" writer="flatFileItemWriter"
commit-interval="1" />
</tasklet>
</step>
<step id="deleteDir">
<tasklet ref="fileDeletingTasklet" />
</step>
</job>
<bean id="fileDeletingTasklet" class="com.mkyong.tasklet.FileDeletingTasklet" >
<property name="directory" value="file:csv/inputs/" />
</bean>
<bean id="multiResourceReader"
class=" org.springframework.batch.item.file.MultiResourceItemReader">
<property name="resources" value="file:csv/inputs/domain-*.csv" />
<property name="delegate" ref="flatFileItemReader" />
</bean>
Thus you can see that the distinction is actually on the level of steps, not for the entire job.

Less number of threads are running parallel - Spring Batch Remote Partitioning

I am working on a Spring Batch project where I have a file of 2 million records. I am doing some processing on it and then saving it to database. Processing is time costly. So I am using Spring Batch Remote partitioning.
First I am manually splitting the file into 15 files and then using multiResourcePartitioner I am assigning each file to a single thread. But what I noticed is that in the start only 4 threads are running parallel and after some time number of threads running parallel are decreasing with time.
This is the configuration:
<batch:job id="GhanshyamESCatalogUpdater">
<batch:step id="GhanshyamCatalogUpdater2" >
<batch:partition step="slave" partitioner="rangePartitioner">
<batch:handler grid-size="15" task-executor="taskExecutor" />
</batch:partition>
</batch:step>
<batch:listeners>
<batch:listener ref="jobFailureListener"/>
</batch:listeners>
</batch:job>
<bean id="rangePartitioner" class="org.springframework.batch.core.partition.support.MultiResourcePartitioner" scope="step">
<property name="resources" value="file:#{jobParameters['job.partitionDir']}/x*">
</property>
</bean>
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
<tasklet>
<chunk reader="gsbmyntraXmlReader" writer="gsbmyntraESWriter" commit-interval="1000" />
</tasklet>
</step>
This is the Task Executor:
<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="100" />
<property name="allowCoreThreadTimeOut" value="true" />
<property name="WaitForTasksToCompleteOnShutdown" value="true" />
</bean>

How to set property using "tasklet ref" tag

I have a tasklet ValidarSituacaoTasklet that has an property situacao. This tasklet is used in 2 steps in distinct values for situacao. I declared steps as like:
and the bean:
<bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
</bean>
I have to pass 'situacao' to tasklet .
I tried:
<step id="validaSituacaoStep">
<tasklet ref="validarSituacaoTasklet ">
<property name="situacao" value="EM_FECHAMENTO"/>
</tasklet>
</step>
but it does not seem to be the right way to do it.
Isn't this what you want:
<step id="validaSituacaoStep">
<tasklet ref="validarSituacaoTasklet "/>
</step>
<bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
<property name="situacao" value="EM_FECHAMENTO"/>
</bean>
UPDATE
Based on the comment left, this should work:
<step id="validaSituacaoStep">
<tasklet>
<bean class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
<property name="situacao" value="EM_FECHAMENTO"/>
</bean>
<tasklet>
</step>
Have you tried the following ?
<bean id="validarSituacaoTasklet" class="my.package.tasklet.ValidarSituacaoTasklet" scope="step">
<property name="situacao" ref="daoBean"/>
</bean>
The DAO should be referenced at your bean's definition

How to execute some partition step on all servers only once using spring batch partitioning?

I am using spring batch partitioning. I read exchanges form files and do some processing for each exchange.
exchanges are distributed over 4 servers to do parallel processing using spring batch partitioning.
I have first step which prepares input files with exchange ids. I need to read these ids on all servers.
Is there any way to run first step on all servers only once to prepare input files on all servers ?
I tried by setting grid size = 4 (number of servers) and consumer concurrency 1 so that on each server only 1 consumer should listen to step execution request.
The problem is, more that 1 request are handled by 1 consumer so steps run more than once on some servers and so does't run on other servers. The result is data is not prepared on some servers and other steps gets failed.
How can I make sure the step runs on all servers only once ?
Below is the configuration
Import job which has prepareExchangeListJob as first step which should work as explained above and second step importExchanges which is normal partition job. And after importExchanges there are many more steps which are normal partition steps.
<job id="importJob">
<step id="import.prepareExchangesListStep" next="import.importExchangesStep">
<job ref="prepareExchangesListJob" />
</step>
<step id="import.importExchangesStep">
<job ref="importExchangesJob" />
<listeners>
<listener ref="importExchangesStepNotifier" />
</listeners>
</step>
</job>
PrepareExchangeList job, please note the grid size= 4 (number of servers) and consumer concurrency = 1 so that the step should exectute only once on each server to prepare input data (exchanges) on all servers.
<rabbit:template id="prepareExchangesListAmqpTemplate"
connection-factory="rabbitConnectionFactory" routing-key="prepareExchangesListQueue"
reply-timeout="${prepare.exchanges.list.step.timeout}">
</rabbit:template>
<int:channel id="prepareExchangesListOutboundChannel">
<int:dispatcher task-executor="taskExecutor" />
</int:channel>
<int:channel id="prepareExchangesListInboundStagingChannel" />
<amqp:outbound-gateway request-channel="prepareExchangesListOutboundChannel"
reply-channel="prepareExchangesListInboundStagingChannel"
amqp-template="prepareExchangesListAmqpTemplate"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<beans:bean id="prepareExchangesListMessagingTemplate"
class="org.springframework.integration.core.MessagingTemplate"
p:defaultChannel-ref="prepareExchangesListOutboundChannel"
p:receiveTimeout="${prepare.exchanges.list.step.timeout}" />
<beans:bean id="prepareExchangesListPartitioner"
class="org.springframework.batch.core.partition.support.SimplePartitioner"
scope="step" />
<beans:bean id="prepareExchangesListPartitionHandler"
class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"
p:stepName="prepareExchangesListStep" p:gridSize="${prepare.exchanges.list.grid.size}"
p:messagingOperations-ref="prepareExchangesListMessagingTemplate" />
<int:aggregator ref="prepareExchangesListPartitionHandler"
send-partial-result-on-expiry="true"
send-timeout="${prepare.exchanges.list.step.timeout}"
input-channel="prepareExchangesListInboundStagingChannel" />
<amqp:inbound-gateway concurrent-consumers="1"
request-channel="prepareExchangesListInboundChannel" reply-channel="prepareExchangesListOutboundStagingChannel"
queue-names="prepareExchangesListQueue" connection-factory="rabbitConnectionFactory"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<int:channel id="prepareExchangesListInboundChannel" />
<int:service-activator ref="stepExecutionRequestHandler"
input-channel="prepareExchangesListInboundChannel" output-channel="prepareExchangesListOutboundStagingChannel" />
<int:channel id="prepareExchangesListOutboundStagingChannel" />
<beans:bean id="prepareExchangesFileItemReader"
class="org.springframework.batch.item.file.FlatFileItemReader"
p:resource="classpath:primary_markets.txt"
p:lineMapper-ref="stLineMapper" scope="step" />
<beans:bean id="prepareExchangesItemWriter"
class="com.st.batch.foundation.writers.PrepareExchangesItemWriter"
p:dirPath="${spring.tmp.batch.dir}/#{jobParameters[batch_id]}" p:numberOfFiles="4"
p:symfony-ref="symfonyStepScoped" scope="step" />
<step id="prepareExchangesListStep">
<tasklet transaction-manager="transactionManager">
<chunk reader="prepareExchangesFileItemReader" writer="prepareExchangesItemWriter" commit-interval="${prepare.exchanges.commit.interval}"/>
</tasklet>
</step>
<job id="prepareExchangesListJob" restartable="true">
<step id="prepareExchangesListStep.master">
<partition partitioner="prepareExchangesListPartitioner"
handler="prepareExchangesListPartitionHandler" />
</step>
</job>
Import Exchanges Job
<rabbit:template id="importExchangesAmqpTemplate"
connection-factory="rabbitConnectionFactory" routing-key="importExchangesQueue"
reply-timeout="${import.exchanges.partition.timeout}">
</rabbit:template>
<int:channel id="importExchangesOutboundChannel">
<int:dispatcher task-executor="taskExecutor" />
</int:channel>
<int:channel id="importExchangesInboundStagingChannel" />
<amqp:outbound-gateway request-channel="importExchangesOutboundChannel"
reply-channel="importExchangesInboundStagingChannel" amqp-template="importExchangesAmqpTemplate"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<beans:bean id="importExchangesMessagingTemplate"
class="org.springframework.integration.core.MessagingTemplate"
p:defaultChannel-ref="importExchangesOutboundChannel"
p:receiveTimeout="${import.exchanges.partition.timeout}" />
<beans:bean id="importExchangesPartitionHandler"
class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"
p:stepName="importExchangesStep" p:gridSize="${import.exchanges.grid.size}"
p:messagingOperations-ref="importExchangesMessagingTemplate" />
<int:aggregator ref="importExchangesPartitionHandler"
send-partial-result-on-expiry="true"
send-timeout="${import.exchanges.step.timeout}"
input-channel="importExchangesInboundStagingChannel" />
<amqp:inbound-gateway concurrent-consumers="${import.exchanges.consumer.concurrency}"
request-channel="importExchangesInboundChannel" reply-channel="importExchangesOutboundStagingChannel"
queue-names="importExchangesQueue" connection-factory="rabbitConnectionFactory"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<int:channel id="importExchangesInboundChannel" />
<int:service-activator ref="stepExecutionRequestHandler"
input-channel="importExchangesInboundChannel" output-channel="importExchangesOutboundStagingChannel" />
<int:channel id="importExchangesOutboundStagingChannel" />
<beans:bean id="importExchangesItemWriter"
class="com.st.batch.foundation.writers.ImportExchangesAndEclsItemWriter"
p:symfony-ref="symfonyStepScoped" p:timeout="${import.exchanges.item.timeout}"
scope="step" />
<beans:bean id="importExchangesPartitioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
p:resources="file:${spring.tmp.batch.dir}/#{jobParameters[batch_id]}/exchanges/exchanges_*.txt"
scope="step" />
<beans:bean id="importExchangesFileItemReader"
class="org.springframework.batch.item.file.FlatFileItemReader"
p:resource="#{stepExecutionContext[fileName]}" p:lineMapper-ref="stLineMapper"
scope="step" />
<step id="importExchangesStep">
<tasklet transaction-manager="transactionManager">
<chunk reader="importExchangesFileItemReader" writer="importExchangesItemWriter" commit-interval="${import.exchanges.commit.interval}"/>
</tasklet>
</step>
<job id="importExchangesJob" restartable="true">
<step id="importExchangesStep.master">
<partition partitioner="importExchangesPartitioner"
handler="importExchangesPartitionHandler" />
</step>
</job>
Interesting technique.
I would expect the four partitions to be distributed evenly; rabbit typically does round robin distribution to competing consumers (AFAIK). So I am not exactly sure why you're not seeing that behavior.
You could spend some time trying to figure it out, but it's fragile in that you are relying on this; if one of the slaves had a network glitch, its partition would go to one of the others. It would be better to have each slave bind to a different queue and explicitly route the partitions by adding a routing key expression to the (first) outbound gateway...
routing-key-expression="'foo.' + headers['sequenceNumber']"
and have the slaves listen on foo.1, foo.2 etc., and continue to use a common queue for the second step.
This assumes you are using the default exchange ("") and route by queue name; if you have explicit bindings, you would use those in your routing key expression.
PS: As a reminder you need to increase the RabbitTemplate reply-timeout if your partitions take more than the default 5 seconds to complete.

Multiple input file Spring Batch

I'm trying to develop a batch which can process a directory containing files with Spring Batch.
I looked at the MultiResourcePartitioner and tryied somethind like :
<job parent="loggerParent" id="importContractESTD" xmlns="http://www.springframework.org/schema/batch">
<step id="multiImportContractESTD">
<batch:partition step="partitionImportContractESTD" partitioner="partitioner">
<batch:handler grid-size="5" task-executor="taskExecutor" />
</batch:partition>
</step>
</job>
<bean id="partitioner" class="org.springframework.batch.core.partition.support.MultiResourcePartitioner">
<property name="keyName" value="inputfile" />
<property name="resources" value="file:${import.contract.filePattern}" />
</bean>
<step id="partitionImportContractESTD" xmlns="http://www.springframework.org/schema/batch">
<batch:job ref="importOneContractESTD" job-parameters-extractor="defaultJobParametersExtractor" />
</step>
<bean id="defaultJobParametersExtractor" class="org.springframework.batch.core.step.job.DefaultJobParametersExtractor"
scope="step" />
<!-- Job importContractESTD definition -->
<job parent="loggerParent" id="importOneContractESTD" xmlns="http://www.springframework.org/schema/batch">
<step parent="baseStep" id="initStep" next="calculateMD5">
<tasklet ref="initTasklet" />
</step>
<step id="calculateMD5" next="importContract">
<tasklet ref="md5Tasklet">
<batch:listeners>
<batch:listener ref="md5Tasklet" />
</batch:listeners>
</tasklet>
</step>
<step id="importContract">
<tasklet>
<chunk reader="contractReader" processor="contractProcessor" writer="contractWriter" commit-interval="${commit.interval}" />
<batch:listeners>
<batch:listener ref="contractProcessor" />
</batch:listeners>
</tasklet>
</step>
</job>
<!-- Chunk definition : Contract ItemReader -->
<bean id="contractReader" class="com.sopra.banking.cirbe.acquisition.batch.AcquisitionFileReader" scope="step">
<property name="resource" value="#{stepExecutionContext[inputfile]}" />
<property name="lineMapper">
<bean id="contractLineMappe" class="org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper">
<property name="tokenizers">
<map>
<entry key="1*" value-ref="headerTokenizer" />
<entry key="2*" value-ref="contractTokenizer" />
</map>
</property>
<property name="fieldSetMappers">
<map>
<entry key="1*" value-ref="headerMapper" />
<entry key="2*" value-ref="contractMapper" />
</map>
</property>
</bean>
</property>
</bean>
<!-- MD5 Tasklet -->
<bean id="md5Tasklet" class="com.sopra.banking.cirbe.acquisition.batch.AcquisitionMD5Tasklet">
<property name="file" value="#{stepExecutionContext[inputfile]}" />
</bean>
But what I get is :
Caused by: org.springframework.expression.spel.SpelEvaluationException: EL1008E:(pos 0): Field or property 'stepExecutionContext' cannot be found on object of type 'org.springframework.beans.factory.config.BeanExpressionContext'
What I'm looking for is a way to launch my job importOneContractESTD for each files contained in file:${import.contract.filePattern}. And each files is shared between the step calculateMD5 (which puts me the processed file md5 into my jobContext) and the step importContract (which read the previous md5 from the jobContext to add it as data to each line processed by the contractProcessor)
If I only try to call importOneContractESTD with one file given as a parameter (eg replacing #{stepExecutionContext[inputfile]} for ${my.file}), it works... But I want to try to use spring batch to manage my directory rather than my calling shell script...
Thanks for your ideas !
Add scope="step" when you need to access stepExecutionContext
like here:
<bean id="md5Tasklet" class="com.sopra.banking.cirbe.acquisition.batch.AcquisitionMD5Tasklet" scope="step">
<property name="file" value="#{stepExecutionContext[inputfile]}" />
</bean>
More info here.

Resources