spring batch job parameters to recurring operation - spring

I have a basic spring batch job (spring-core-3.1.1) application setup running with quartz scheduler (1.8.6). it looks like this,
- spring batch job has a mysql datasource to save job states in spring batch schema
- job Reader is a csvFile reader using class org.springframework.batch.item.file.FlatFileItemReader
- Writer is simple custom ItemWriter (output is on console)
- quartz scheduler is used to setup crontrigger alongwith jobdetail bean
- scheudler runs the job every 10 seconds (*/10 * * * * ?)
I want to customize this setup by reading the CSV file for only X number of lines per job instance instead of reading the whole file e.g. if there are 10 lines in a file, and I want to read 2 lines per step, then the job instance should read only 2 lines instead of 10 atonce. For that I want to give the job dynamic params based on the number of lines read. So that for each job execution the job instance have unique and incrementing params. Like a cursor to the file reader.
How to achieve it?
My jobdetail property for param
<property name="jobDataAsMap">
<map>
<entry key="jobName" value="reportJob" />
<entry key="jobLocator" value-ref="jobRegistry" />
<entry key="jobLauncher" value-ref="jobLauncher" />
<entry key="cursor" value="0"/>
<!-- Gives error on this one: <entry key="cursor" value="#{jobParameters['cursor']}"/>
</map>
</property>

You can read latest successfully completed job parameters to detect latest starting line then add 2 to this value and call the new job.
You can read job tables metadata using JobExplorer or directly query spring-batch's meta-data tables

Related

Spring Integration - Wait till finishes processing file

MyHandler class takes about 10-20 seconds (approximately) to process a huge 200MB csv/txt file. If I drop a file in the'my.test.dir' directory, MyHandler keeps picking the same file multiple times. To avoid this, I set prevent-duplicates to false. But I might get a file with the same file name after some time. It's not picking up files with the same name later. Please suggest, how to handle this scenario? MyHandler has to wait until it finishes processing the file.
<bean id="test-file-bean" class="com.test.MyHandler"/>
<int-file:inbound-channel-adapter
id="test-adapter-inbound"
directory="${my.test.dir}"
channel="test-file-channel"
filter="test-file-filter"
prevent-duplicates="false" auto-startup="true"
auto-create-directory="true">
<int:poller fixed-delay="5"/>
</int-file:inbound-channel-adapter>
<int:service-activator
input-channel="test-file-channel" ref="test-file-bean" method="handleFlow"/>
Thanks.
Consider to use a FileSystemPersistentAcceptOnceFileListFilter to prevent duplicates, but pass those which timestamp has been changed.
See more info in docs : https://docs.spring.io/spring-integration/docs/current/reference/html/file.html#file-reading.
Over there you also can find a ChainFileListFilter if you need to combine with your own.

How to override Transaction service timeout value in WAS Console by code?

In my xml file, I have something as follow:
<bean id="transactionManager"
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
p:defaultTimeout="60" />
<bean id="sharedTransactionTemplate"
class="org.springframework.transaction.support.TransactionTemplate">
<constructor-arg>
<ref bean="transactionManager" />
</constructor-arg>
<property name="isolationLevelName" value="${sharedTransactionTemplate.isolationlevel:ISOLATION_READ_UNCOMMITTED}"/>
<property name="timeout" value="60"/>
</bean>
With the value 60, my program will hit timeout if the response from db taking more than 60 seconds. This is correct and also what I expected.
And I found that there is some transaction time out value setting in WAS Console as well:
Server --> WebSphere application servers --> my server
Under Container Settings --> click on Container Services --> Transaction service
Inside Transaction service page, there is a value call "Total transaction lifetime timeout ". I set the value to 80.
In my application, I have a part that will trigger Spring SimpleJobLauncher to run a spring batch in my application. In my Spring batch, I have some for loop which is write some data in log file, and it does not have any interaction with DB.
I found that, my for loop will not hit the 60 seconds time out after 60 seconds. It will only hit the 80 seconds time out. I believe that it is because of it didn't call db.
My code is something as follow:
#Autowired
#Qualifier("sharedTransactionTemplate")
private TransactionTemplate transactionTemplate;
transactionTemplate.execute( new TransactionCallbackWithoutResult( ) {
// In here I trigger the spring batch
} );
I would like to edit this value to for example 70 seconds base on code in xml or any way. I do not want to edit it in WAS Console because I still want other method still using the 80 seconds.
Any ideas?
Here is what my spring batch doing:
Call db, update something. (done with no error)
reader, read data from db. (done with no error)
Before write, i got some for loop which is not call db. --> hit timeout here, I found that the timeout value is the value that set in WAS Console, instead of the value set in xml.
and so on...
I actually want to do something that I can code in xml, so that this spring batch can use my own value set in xml. SO that my step 3 can use my own value.
Additional question, are these following class only applicable for transaction that involve connection to database?
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
class="org.springframework.transaction.support.TransactionTemplate"
It is unclear to me from the info you've provided why you are executing your Spring Batch job transactionally, you may want to consider whether you need to. Although not a duplicate, this question is similar to this one in which you can see one possible solution is to start a UserTransaction for your spring batch job which you can control the timeout. As pointed out in that answer and subsequent comments, there are some limitations and considerations about using this method.

Spring task scheduler first task execution time

After some research on spring task scheduler and task executor, I found below spring config would run myTask.run method every second.
<bean id="myTask" class="com.amazon.path.to.MyTask"/>
<task:scheduled-tasks scheduler="myScheduler">
<!--run once every second-->
<task:scheduled ref="myTask" method="run" fixed-rate="1000"/>
<!--alternatively, run constantly, waiting one second after each run finished to start the next-->
<!--<task:scheduled ref="myTask" method="run" fixed-delay="1000"/>-->
</task:scheduled-tasks>
<task:scheduler id="myScheduler" pool-size="1"/>
I want to know exactly when first invocation of the myTask.run occurs. It is mentioned nowhere in the docs. I referred Spring XSD for this. Also element "task:scheduler" creates instance of ThreadPoolTaskScheduler which from Spring Doc has method
scheduleAtFixedRate(Runnable task, Date startTime, long period)
which has startTime parameter. But in XSD reference above, there is no such attribute.

How to have a Scheduler at the parent job for all child jobs?

The situation is as follows. I want to have a parent job with some common properties, an ExecutionListener and a Scheduler. There could be many child jobs that extend from my parent job. Now the Scheduler at the parent needs to read all the child jobIds, pick-up the corresponding cron expressions from a DB and execute/schedule the jobs. Something of the sort:
<job id="job1">
<step id="step1">
<tasklet><bean id="some bean"/></tasklet>
</step>
</job>
<bean id="myjob1" parent="parentJob">
<property name="job" value="job1"/>
<property name="jobId" value=123/>
</bean>
Similarly, there could be more jobs extending "parentJob". Now at the "parentJob" I am trying to do something as follows:
scheduler = new ThreadPoolTaskScheduler();
scheduler.setPoolSize(5);
scheduler.schedule(new TriggerTask(), new Cron(some expr)
The challenge at hand is, the child jobIds are getting lost. At most the last child's jobId is getting picked up but not the others. NOTE: new TriggerTask() is an inner class that implements 'Runnable'.
Somehow I think I am messing up something bad with threads.
Could someone please assist or provide some directions on how this could be achieved?
Thanks

What to choose ThreadPoolTaskExecutor or SimpleAsyncTaskExecutor in my case?

I am working on a existing Application which has got this piece of code
<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="50" />
<property name="maxPoolSize" value="200" />
<property name="queueCapacity" value="250" />
</bean>
We have a method , which is using the above taskExecutor to complete a particuar task .
(This task should be completed , but can be completed asycnchronsly )
This particular task is actually responsible to insert 100 documents into Database .
So i was planning to use SimpleAsyncTaskExecutor instead of ThreadPoolTaskExecutor .
Please let me know if this will impact performance or create any issues
Our Application is a multithreaded one , and there will be aprox 700 users at any time .
So i dont know how it behaves under production environment (which may be fine during development )
As far as I know the usage of SimpleAsyncTaskExecutor does make sense in cases, if you want to execute some long-time-executing tasks, e.g. if you want to compress log files at the end of a day. In other cases, if you want to execute a short-time-executing task every n seconds or minutes, you should use the ThreadPoolTaskExecutor, because of reusing of system resources.
Technically both variants will work. But I would use ThreadPoolTaskExecutor for your task.

Resources