Spring Batch Item Reader Reading Same Items Multiple Times

Spring Batch Item Reader Reading Same Items Multiple Times - spring

I am using a JpaPagingItemReader and I noticed that it is reading the same rows more than once. In the documentation it does say that "Caller might get the same item twice from successive calls (or otherwise), if the first call was in a transaction that rolled back." but when I checked the roll back count it was 0. I added all listeners with logs in each method and the reading, processing and writing all complete successfully (onError() is never called). What could be causing the loading of the same item more than once?
UPDATE:
The JPAPagingItemReader is configured as follows
<bean id="reader" class="org.springframework.batch.item.database.JpaPagingItemReader" scope="step" p:entityManagerFactory-ref="entityManagerFactory" >
<property name="queryString" value="SELECT item FROM Item item WHERE item.batchId = #{jobParameters[batchId]}"/>
</bean>

Related

Spring Integration - Wait till finishes processing file

MyHandler class takes about 10-20 seconds (approximately) to process a huge 200MB csv/txt file. If I drop a file in the'my.test.dir' directory, MyHandler keeps picking the same file multiple times. To avoid this, I set prevent-duplicates to false. But I might get a file with the same file name after some time. It's not picking up files with the same name later. Please suggest, how to handle this scenario? MyHandler has to wait until it finishes processing the file.
<bean id="test-file-bean" class="com.test.MyHandler"/>
<int-file:inbound-channel-adapter
id="test-adapter-inbound"
directory="${my.test.dir}"
channel="test-file-channel"
filter="test-file-filter"
prevent-duplicates="false" auto-startup="true"
auto-create-directory="true">
<int:poller fixed-delay="5"/>
</int-file:inbound-channel-adapter>
<int:service-activator
input-channel="test-file-channel" ref="test-file-bean" method="handleFlow"/>
Thanks.

Consider to use a FileSystemPersistentAcceptOnceFileListFilter to prevent duplicates, but pass those which timestamp has been changed.
See more info in docs : https://docs.spring.io/spring-integration/docs/current/reference/html/file.html#file-reading.
Over there you also can find a ChainFileListFilter if you need to combine with your own.

Determine the end of a cyclic workflow in spring integration (inbound-channel => service-activator)

We have the following simple int-jpa based workflow:
[inbound-channel-adapter] -> [service-activator]
The config is like this:
<int:channel id="inChannel"> <int:queue/> </int:channel>
<int:channel id="outChannel"> <int:queue/> </int:channel>
<int-jpa:inbound-channel-adapter id="inChannelAdapter" channel="inChannel"
jpa-query="SOME_COMPLEX_POLLING_QUERY"
max-results="2">
<int:poller max-messages-per-poll="2" fixed-rate="20" >
<int:advice-chain synchronization-factory="txSyncFactory" >
<tx:advice transaction-manager="transactionManager" >
<tx:attributes>
<tx:method name="*" timeout="30000" />
</tx:attributes>
</tx:advice>
<int:ref bean="pollerAdvice"/>
</int:advice-chain>
</int-jpa:inbound-channel-adapter>
<int:service-activator input-channel="inChannel" ref="myActivator"
method="pollEntry" output-channel="outChannel" />
<bean id="myActivator" class="com.company.myActivator" />
<bean id="pollerAdvice" class="com.company.myPollerAdvice" />
The entry point for processing is a constantly growing table against which the SOME_COMPLEX_POLLING_QUERY is run. The current flow is :
[Thread-1] The SOME_COMPLEX_POLLING_QUERY will only return entries that has busy set to false (we set busy to true as soon as polling is done using txSyncFactory)
[Thread-2] These entries will pass through the myActivator where it might take anywhere from 1 min to 30 mins.
[Thread-2] Once the processing is done, we set back the busy from true to false
Problem: We need to trigger a notification even when the processing of all the entries that were present in the table is done.
Approach tried: We used the afterReturning of pollerAdvice to find out if the SOME_COMPLEX_POLLING_QUERY returned any results or not. However this method will start returning "No Entries" way before the Thread-2 is done processing all the entries.
Note:
The same entries will be processes again after 24hrs. But this time it will have more entries.
We are not using outbound-channel-adapter, since we dont have any requirement for it. However, we are open to use it, if that is a part of the solution proposed.

Not sure if that will work for you, but since you still need to wait with the notification until Thread-2, I would suggest to have some AtomicBoolean bean. In the mentioned afterReturning(), when there is no data polled from the DB, you just change the state of the AtomicBoolean to true. When the Thread-2 finishes its work, it can call <filter> to check the state of the AtomicBoolean and then really perform an <int-event:outbound-channel-adapter> to emit a notification event.
So, the final decision to emit event or not is definitely done from the Thread-2, not polling channel adapter.

How to override Transaction service timeout value in WAS Console by code?

In my xml file, I have something as follow:
<bean id="transactionManager"
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
p:defaultTimeout="60" />
<bean id="sharedTransactionTemplate"
class="org.springframework.transaction.support.TransactionTemplate">
<constructor-arg>
<ref bean="transactionManager" />
</constructor-arg>
<property name="isolationLevelName" value="${sharedTransactionTemplate.isolationlevel:ISOLATION_READ_UNCOMMITTED}"/>
<property name="timeout" value="60"/>
</bean>
With the value 60, my program will hit timeout if the response from db taking more than 60 seconds. This is correct and also what I expected.
And I found that there is some transaction time out value setting in WAS Console as well:
Server --> WebSphere application servers --> my server
Under Container Settings --> click on Container Services --> Transaction service
Inside Transaction service page, there is a value call "Total transaction lifetime timeout ". I set the value to 80.
In my application, I have a part that will trigger Spring SimpleJobLauncher to run a spring batch in my application. In my Spring batch, I have some for loop which is write some data in log file, and it does not have any interaction with DB.
I found that, my for loop will not hit the 60 seconds time out after 60 seconds. It will only hit the 80 seconds time out. I believe that it is because of it didn't call db.
My code is something as follow:
#Autowired
#Qualifier("sharedTransactionTemplate")
private TransactionTemplate transactionTemplate;
transactionTemplate.execute( new TransactionCallbackWithoutResult( ) {
// In here I trigger the spring batch
} );
I would like to edit this value to for example 70 seconds base on code in xml or any way. I do not want to edit it in WAS Console because I still want other method still using the 80 seconds.
Any ideas?
Here is what my spring batch doing:
Call db, update something. (done with no error)
reader, read data from db. (done with no error)
Before write, i got some for loop which is not call db. --> hit timeout here, I found that the timeout value is the value that set in WAS Console, instead of the value set in xml.
and so on...
I actually want to do something that I can code in xml, so that this spring batch can use my own value set in xml. SO that my step 3 can use my own value.
Additional question, are these following class only applicable for transaction that involve connection to database?
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
class="org.springframework.transaction.support.TransactionTemplate"

It is unclear to me from the info you've provided why you are executing your Spring Batch job transactionally, you may want to consider whether you need to. Although not a duplicate, this question is similar to this one in which you can see one possible solution is to start a UserTransaction for your spring batch job which you can control the timeout. As pointed out in that answer and subsequent comments, there are some limitations and considerations about using this method.

spring batch job parameters to recurring operation

I have a basic spring batch job (spring-core-3.1.1) application setup running with quartz scheduler (1.8.6). it looks like this,
- spring batch job has a mysql datasource to save job states in spring batch schema
- job Reader is a csvFile reader using class org.springframework.batch.item.file.FlatFileItemReader
- Writer is simple custom ItemWriter (output is on console)
- quartz scheduler is used to setup crontrigger alongwith jobdetail bean
- scheudler runs the job every 10 seconds (*/10 * * * * ?)
I want to customize this setup by reading the CSV file for only X number of lines per job instance instead of reading the whole file e.g. if there are 10 lines in a file, and I want to read 2 lines per step, then the job instance should read only 2 lines instead of 10 atonce. For that I want to give the job dynamic params based on the number of lines read. So that for each job execution the job instance have unique and incrementing params. Like a cursor to the file reader.
How to achieve it?
My jobdetail property for param
<property name="jobDataAsMap">
<map>
<entry key="jobName" value="reportJob" />
<entry key="jobLocator" value-ref="jobRegistry" />
<entry key="jobLauncher" value-ref="jobLauncher" />
<entry key="cursor" value="0"/>
<!-- Gives error on this one: <entry key="cursor" value="#{jobParameters['cursor']}"/>
</map>
</property>

You can read latest successfully completed job parameters to detect latest starting line then add 2 to this value and call the new job.
You can read job tables metadata using JobExplorer or directly query spring-batch's meta-data tables

What to choose ThreadPoolTaskExecutor or SimpleAsyncTaskExecutor in my case?

I am working on a existing Application which has got this piece of code
<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="50" />
<property name="maxPoolSize" value="200" />
<property name="queueCapacity" value="250" />
</bean>
We have a method , which is using the above taskExecutor to complete a particuar task .
(This task should be completed , but can be completed asycnchronsly )
This particular task is actually responsible to insert 100 documents into Database .
So i was planning to use SimpleAsyncTaskExecutor instead of ThreadPoolTaskExecutor .
Please let me know if this will impact performance or create any issues
Our Application is a multithreaded one , and there will be aprox 700 users at any time .
So i dont know how it behaves under production environment (which may be fine during development )

As far as I know the usage of SimpleAsyncTaskExecutor does make sense in cases, if you want to execute some long-time-executing tasks, e.g. if you want to compress log files at the end of a day. In other cases, if you want to execute a short-time-executing task every n seconds or minutes, you should use the ThreadPoolTaskExecutor, because of reusing of system resources.
Technically both variants will work. But I would use ThreadPoolTaskExecutor for your task.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Spring Batch Item Reader Reading Same Items Multiple Times - spring

Related

Spring Integration - Wait till finishes processing file

Determine the end of a cyclic workflow in spring integration (inbound-channel => service-activator)

How to override Transaction service timeout value in WAS Console by code?

spring batch job parameters to recurring operation

What to choose ThreadPoolTaskExecutor or SimpleAsyncTaskExecutor in my case?

Categories

Resources