What to choose ThreadPoolTaskExecutor or SimpleAsyncTaskExecutor in my case? - spring

I am working on a existing Application which has got this piece of code
<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="50" />
<property name="maxPoolSize" value="200" />
<property name="queueCapacity" value="250" />
</bean>
We have a method , which is using the above taskExecutor to complete a particuar task .
(This task should be completed , but can be completed asycnchronsly )
This particular task is actually responsible to insert 100 documents into Database .
So i was planning to use SimpleAsyncTaskExecutor instead of ThreadPoolTaskExecutor .
Please let me know if this will impact performance or create any issues
Our Application is a multithreaded one , and there will be aprox 700 users at any time .
So i dont know how it behaves under production environment (which may be fine during development )

As far as I know the usage of SimpleAsyncTaskExecutor does make sense in cases, if you want to execute some long-time-executing tasks, e.g. if you want to compress log files at the end of a day. In other cases, if you want to execute a short-time-executing task every n seconds or minutes, you should use the ThreadPoolTaskExecutor, because of reusing of system resources.
Technically both variants will work. But I would use ThreadPoolTaskExecutor for your task.

Related

How to override Transaction service timeout value in WAS Console by code?

In my xml file, I have something as follow:
<bean id="transactionManager"
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
p:defaultTimeout="60" />
<bean id="sharedTransactionTemplate"
class="org.springframework.transaction.support.TransactionTemplate">
<constructor-arg>
<ref bean="transactionManager" />
</constructor-arg>
<property name="isolationLevelName" value="${sharedTransactionTemplate.isolationlevel:ISOLATION_READ_UNCOMMITTED}"/>
<property name="timeout" value="60"/>
</bean>
With the value 60, my program will hit timeout if the response from db taking more than 60 seconds. This is correct and also what I expected.
And I found that there is some transaction time out value setting in WAS Console as well:
Server --> WebSphere application servers --> my server
Under Container Settings --> click on Container Services --> Transaction service
Inside Transaction service page, there is a value call "Total transaction lifetime timeout ". I set the value to 80.
In my application, I have a part that will trigger Spring SimpleJobLauncher to run a spring batch in my application. In my Spring batch, I have some for loop which is write some data in log file, and it does not have any interaction with DB.
I found that, my for loop will not hit the 60 seconds time out after 60 seconds. It will only hit the 80 seconds time out. I believe that it is because of it didn't call db.
My code is something as follow:
#Autowired
#Qualifier("sharedTransactionTemplate")
private TransactionTemplate transactionTemplate;
transactionTemplate.execute( new TransactionCallbackWithoutResult( ) {
// In here I trigger the spring batch
} );
I would like to edit this value to for example 70 seconds base on code in xml or any way. I do not want to edit it in WAS Console because I still want other method still using the 80 seconds.
Any ideas?
Here is what my spring batch doing:
Call db, update something. (done with no error)
reader, read data from db. (done with no error)
Before write, i got some for loop which is not call db. --> hit timeout here, I found that the timeout value is the value that set in WAS Console, instead of the value set in xml.
and so on...
I actually want to do something that I can code in xml, so that this spring batch can use my own value set in xml. SO that my step 3 can use my own value.
Additional question, are these following class only applicable for transaction that involve connection to database?
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
class="org.springframework.transaction.support.TransactionTemplate"
It is unclear to me from the info you've provided why you are executing your Spring Batch job transactionally, you may want to consider whether you need to. Although not a duplicate, this question is similar to this one in which you can see one possible solution is to start a UserTransaction for your spring batch job which you can control the timeout. As pointed out in that answer and subsequent comments, there are some limitations and considerations about using this method.

Prevent duplicates across restarts in spring integration

I have to poll a directory and write entries to rdbms.
I wired up a redis metadatstore for duplicates check. I see that the framework updates the redis store with entries for all files in the folder [~ 140 files], much before the rdbms entries gets written. At the time of application termination, rdbms has logged only 90 files. On application restart no more files are picked from folder.
Properties: msgs.per.poll=10, polling.interval=2000
How can I ensure entries to redis are made after writing to db, so that both are in sync and I don't miss any files.
<code>
<task:executor id="executor" pool-size="5" />
<int-file:inbound-channel-adapter channel="filesIn" directory="${input.Dir}" scanner="dirScanner" filter="compositeFileFilter" prevent-duplicates="true">
<int:poller fixed-delay="${polling.interval}" max-messages-per-poll="${msgs.per.poll}" task-executor="executor">
</int:poller>
</int-file:inbound-channel-adapter>
<int:channel id="filesIn" />
<bean id="dirScanner" class="org.springframework.integration.file.RecursiveLeafOnlyDirectoryScanner" />
<bean id="compositeFileFilter" class="org.springframework.integration.file.filters.CompositeFileListFilter">
<constructor-arg ref="persistentFilter" />
</bean>
<bean id="persistentFilter" class="org.springframework.integration.file.filters.FileSystemPersistentAcceptOnceFileListFilter">
<constructor-arg ref="metadataStore" />
</bean>
<bean name="metadataStore" class="org.springframework.integration.redis.metadata.RedisMetadataStore">
<constructor-arg name="connectionFactory" ref="redisConnectionFactory"/>
</bean>
<bean id="redisConnectionFactory" class="org.springframework.data.redis.connection.jedis.JedisConnectionFactory" p:hostName="localhost" p:port="6379" />
<int-jdbc:outbound-channel-adapter channel="filesIn" data-source="dataSource" query="insert into files values (:path,:name,:size,:crDT,:mdDT,:id)"
sql-parameter-source-factory="spelSource">
</int-jdbc:outbound-channel-adapter>
....
</code>
Artem is correct, you might as well extend the RedisMetadataStore and flush the entries that are not in your database on initialization time, this way you could use Redis and be in sync with the DB. But this kind of couples things a little.
How can I ensure entries to redis are made after writing to db
It's isn't possible, because FileSystemPersistentAcceptOnceFileListFilter works before any message sending and only once, when FileReadingMessageSource.toBeReceived is empty. Of course, it tries to refetch files on the next application restart, but it can't do that because your RedisMetadataStore already contains entries for those files.
I think we don't have in your case any choice unless use some custom JdbcFileListFilter based on your files table. Fortunately you logic ends up with file entry anyway.

Spring batch admin remote partition steps running maximum 8 threads even though concurrency is 10?

I am using spring batch remote partitioning for batch process. I am launching jobs using spring batch admin.
I have inbound gateway consumer concurrency step to 10 but maximum number of partitions running in parallel are 8.
I want to increase the consumer concurrency to 15 later on.
Below is my configuration,
<task:executor id="taskExecutor" pool-size="50" />
<rabbit:template id="computeAmqpTemplate"
connection-factory="rabbitConnectionFactory" routing-key="computeQueue"
reply-timeout="${compute.partition.timeout}">
</rabbit:template>
<int:channel id="computeOutboundChannel">
<int:dispatcher task-executor="taskExecutor" />
</int:channel>
<int:channel id="computeInboundStagingChannel" />
<amqp:outbound-gateway request-channel="computeOutboundChannel"
reply-channel="computeInboundStagingChannel" amqp-template="computeAmqpTemplate"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<beans:bean id="computeMessagingTemplate"
class="org.springframework.integration.core.MessagingTemplate"
p:defaultChannel-ref="computeOutboundChannel"
p:receiveTimeout="${compute.partition.timeout}" />
<beans:bean id="computePartitionHandler"
class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"
p:stepName="computeStep" p:gridSize="${compute.grid.size}"
p:messagingOperations-ref="computeMessagingTemplate" />
<int:aggregator ref="computePartitionHandler"
send-partial-result-on-expiry="true" send-timeout="${compute.step.timeout}"
input-channel="computeInboundStagingChannel" />
<amqp:inbound-gateway concurrent-consumers="${compute.consumer.concurrency}"
request-channel="computeInboundChannel"
reply-channel="computeOutboundStagingChannel" queue-names="computeQueue"
connection-factory="rabbitConnectionFactory"
mapped-request-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS"
mapped-reply-headers="correlationId, sequenceNumber, sequenceSize, STANDARD_REQUEST_HEADERS" />
<int:channel id="computeInboundChannel" />
<int:service-activator ref="stepExecutionRequestHandler"
input-channel="computeInboundChannel" output-channel="computeOutboundStagingChannel" />
<int:channel id="computeOutboundStagingChannel" />
<beans:bean id="computePartitioner"
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner"
p:resources="file:${spring.tmp.batch.dir}/#{jobParameters[batch_id]}/shares_rics/shares_rics_*.txt"
scope="step" />
<beans:bean id="computeFileItemReader"
class="org.springframework.batch.item.file.FlatFileItemReader"
p:resource="#{stepExecutionContext[fileName]}" p:lineMapper-ref="stLineMapper"
scope="step" />
<beans:bean id="computeItemWriter"
class="com.st.batch.foundation.writers.ComputeItemWriter"
p:symfony-ref="symfonyStepScoped" p:timeout="${compute.item.timeout}"
p:batchId="#{jobParameters[batch_id]}" scope="step" />
<step id="computeStep">
<tasklet transaction-manager="transactionManager">
<chunk reader="computeFileItemReader" writer="computeItemWriter"
commit-interval="${compute.commit.interval}" />
</tasklet>
</step>
<flow id="computeFlow">
<step id="computeStep.master">
<partition partitioner="computePartitioner"
handler="computePartitionHandler" />
</step>
</flow>
<job id="computeJob" restartable="true">
<flow id="computeJob.computeFlow" parent="computeFlow" />
</job>
compute.grid.size = 112
compute.consumer.concurrency = 10
Input files are splited to 112 equal parts = compute.grid.size = total number of partitions
Number of servers = 4.
There are 2 problems,
i) Even though I have set concurrency to 10, maximum number of threads running are 8.
ii)
some are slower as other processes runs on them and some are faster so I want make sure step executions are distributed fairly i.e. if faster servers are done with their execution, other remaining executions in queue should go to them . It should not be distributed round robbin fashion.
I know in rabbitmq there is prefetch count setting and ack mode to distribute farely. For spring integration, prefetch count is 1 default and ack mode is AUTO by default. But still some servers keeps running more partitions even though other servers are done for long time. Ideally no servers should be sitting idle.
Update:
One more thing I now observed is that, for some steps which runs in parallel using split (not distributed using remote partitioning) also run max 8 in parallel. It looks something like thread pool limit issue but as you can see taskExecutor has pool-size set to 50.
Is there anything in spring-batch/spring-batch-admin which limits number of concurrently running steps ?
2nd Update:
And, if there are 8 or more threads running in parallel processing items, spring batch admin doesn't load. It just hangs. If I reduce concurrency, spring batch admin loads. I even tested it with setting concurrency 4 on one server and 8 on other server, spring batch admin doesn't load it I use URL of server where 8 threads are running but it works on the server where 4 threads are running.
Spring batch admin manager has below jobLauncher configuration,
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<property name="taskExecutor" ref="jobLauncherTaskExecutor" />
</bean>
<task:executor id="jobLauncherTaskExecutor" pool-size="6" rejection-policy="ABORT" />
The pool size is 6 there, has it anything to do with above problem ?
Or is there anything in tomcat 7 which restricts number of threads running to 8 ?
Are you using a database for JobRepository?
During the execution, the batch frameworks persists step executions and number of connections to the JobRepository database can interfere in parallel step executions.
Concurrency of 8 makes me thinks you might be using BasicDataSource? If so, switch to something like DriverManagerDataSource and see.
Confused - you said "I have set the concurrency to 10" but then show compute.consumer.concurrency = 8. So it is working as configured. It is impossible to have only 8 consumer threads if the property is set to 10.
From Rabbit's perspective, all consumers are equal - if there are 10 consumers on a slow box and 10 consumers on a fast box, and you only have 10 partitions, it is possible that all 10 partitions will end up on the slow box.
RabbitMQ does not distribute work across servers, it distributes the work across consumers only.
You might get better distribution by reducing the concurrency. You should also set the concurrency lower on the slower boxes.

spring batch job parameters to recurring operation

I have a basic spring batch job (spring-core-3.1.1) application setup running with quartz scheduler (1.8.6). it looks like this,
- spring batch job has a mysql datasource to save job states in spring batch schema
- job Reader is a csvFile reader using class org.springframework.batch.item.file.FlatFileItemReader
- Writer is simple custom ItemWriter (output is on console)
- quartz scheduler is used to setup crontrigger alongwith jobdetail bean
- scheudler runs the job every 10 seconds (*/10 * * * * ?)
I want to customize this setup by reading the CSV file for only X number of lines per job instance instead of reading the whole file e.g. if there are 10 lines in a file, and I want to read 2 lines per step, then the job instance should read only 2 lines instead of 10 atonce. For that I want to give the job dynamic params based on the number of lines read. So that for each job execution the job instance have unique and incrementing params. Like a cursor to the file reader.
How to achieve it?
My jobdetail property for param
<property name="jobDataAsMap">
<map>
<entry key="jobName" value="reportJob" />
<entry key="jobLocator" value-ref="jobRegistry" />
<entry key="jobLauncher" value-ref="jobLauncher" />
<entry key="cursor" value="0"/>
<!-- Gives error on this one: <entry key="cursor" value="#{jobParameters['cursor']}"/>
</map>
</property>
You can read latest successfully completed job parameters to detect latest starting line then add 2 to this value and call the new job.
You can read job tables metadata using JobExplorer or directly query spring-batch's meta-data tables

Purpose of taskExecutor property in Spring's DefaultMessageListenerContainer

The Spring's DefaultMessageListenerContainer (DMLC) has concurrentConsumer and taskExecutor property. The taskExecutor bean can be given corePoolSize property. What is then the difference between specifying concurrentConsumer and corePoolSize ? When concurrentConsumer property is defined it means that Spring will create specified number of consumer/messageListeners to process the message. When does corePoolSize comes into picture ?
Code snippet
<bean id="myMessageListener"
class="org.springframework.jms.listener.DefaultMessageListenerContainer">
<property name="connectionFactory" ref="connectionFactory" />
<property name="destination" ref="myQueue" />
<property name="messageListener" ref="myListener" />
<property name="cacheLevelName" value="CACHE_CONSUMER"/>
<property name="maxConcurrentConsumers" value="10"/>
<property name="concurrentConsumers" value="3"/>
<property name="taskExecutor" ref="myTaskExecutor"/>
</bean>
<bean id="myTaskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor" >
<property name="corePoolSize" value="100"/>
<property name="maxPoolSize" value="100"/>
<property name="keepAliveSeconds" value="30"/>
<property name="threadNamePrefix" value="myTaskExecutor"/>
</bean>
According to 4.3.6 version, the taskExecutor contains instances of AsyncMessageListenerInvoker which responsible for message processing. corePoolSize is a number of physical threads in the defined pool, while concurrentConsumer is a number of tasks in this pool. I guess this abstraction was designed for more flexible control.
The Purpose of TaskExecutor Property
Set the Spring TaskExecutor to use for running the listener threads.
Default is a SimpleAsyncTaskExecutor, starting up a number of new threads, according to the specified number of concurrent consumers.
Specify an alternative TaskExecutor for integration with an existing thread pool.
Above is from [Spring Official Documentation][1]
When you specify the alternative task executor, then instead of using the asyncTaskExcutor the listener threads will use the defined task executor.
This can be easily illustrated when we define two jmsListeners with the same containerFactory. when you specify the concurrency, the concurrency should support the taskExecutor corePoolSize and maxPoolSize.
If you set the concurrency as 5-20 and you have two listeners then you should set the core poolSize more than 10 and the maxPoolSize more than 40. then listeners can get the threads accordingly their concurrency limit.
In this case, If you set the maxPoolsize to less than 10 then the listener containers will not be upon 10. From the spring you will get below warning as well
The number of scheduled consumers has dropped below concurrent consumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers.
basically, the listener threads will act based on the taskExecutor property.

Resources