Reading of files with max-messages-per-poll=10 and prevent-duplicates=false - spring

I'm trying to read files from the directory. If file cannot be processed it stays there to be tried later.
<file:inbound-channel-adapter prevent-duplicates="false" id="fileInput" directory="file:${java.io.dir}/input-data" auto-create-directory="true" filter="compositeFileFilterBean"/>
<integration:poller id="poller" max-messages-per-poll="10" default="true" >
<integration:interval-trigger interval="60" time-unit="SECONDS" />
</integration:poller>
The problem is if max-messages-per-poll set to, say 10, then each poll will return exactly 10 messages, even if there is only 1 file (i.e. all 10 messages will be the same).

Yes, that would be the expected behavior with those settings.
I am not sure why you think that is wrong.
If there is a file in the directory that is not filtered by a filter (such as the one that prevents duplicates), it will be found by the poller, either within the current poll (when max-messages-per-poll is > 1) or on the next poll.
To do what you want, you would need a custom filter, that would filter a file that was previously found within your 60 seconds polling interval.

You can:
Option1. set the property "prevent-duplicates" to true in inbound-channel-adapter. This property is true by default AND in case no other filter is in place, or file-regex. If we are using a custom filter, springs understand that our custom filter will include AcceptOnceFileListFilter, so it sets prevent-duplicates to false.
Option2. complete bean "compositeFileFilterBean" with filter org.springframework.integration.file.filters.AcceptOnceFileListFilter"

Related

Spring Integration - Wait till finishes processing file

MyHandler class takes about 10-20 seconds (approximately) to process a huge 200MB csv/txt file. If I drop a file in the'my.test.dir' directory, MyHandler keeps picking the same file multiple times. To avoid this, I set prevent-duplicates to false. But I might get a file with the same file name after some time. It's not picking up files with the same name later. Please suggest, how to handle this scenario? MyHandler has to wait until it finishes processing the file.
<bean id="test-file-bean" class="com.test.MyHandler"/>
<int-file:inbound-channel-adapter
id="test-adapter-inbound"
directory="${my.test.dir}"
channel="test-file-channel"
filter="test-file-filter"
prevent-duplicates="false" auto-startup="true"
auto-create-directory="true">
<int:poller fixed-delay="5"/>
</int-file:inbound-channel-adapter>
<int:service-activator
input-channel="test-file-channel" ref="test-file-bean" method="handleFlow"/>
Thanks.
Consider to use a FileSystemPersistentAcceptOnceFileListFilter to prevent duplicates, but pass those which timestamp has been changed.
See more info in docs : https://docs.spring.io/spring-integration/docs/current/reference/html/file.html#file-reading.
Over there you also can find a ChainFileListFilter if you need to combine with your own.

How to override Transaction service timeout value in WAS Console by code?

In my xml file, I have something as follow:
<bean id="transactionManager"
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
p:defaultTimeout="60" />
<bean id="sharedTransactionTemplate"
class="org.springframework.transaction.support.TransactionTemplate">
<constructor-arg>
<ref bean="transactionManager" />
</constructor-arg>
<property name="isolationLevelName" value="${sharedTransactionTemplate.isolationlevel:ISOLATION_READ_UNCOMMITTED}"/>
<property name="timeout" value="60"/>
</bean>
With the value 60, my program will hit timeout if the response from db taking more than 60 seconds. This is correct and also what I expected.
And I found that there is some transaction time out value setting in WAS Console as well:
Server --> WebSphere application servers --> my server
Under Container Settings --> click on Container Services --> Transaction service
Inside Transaction service page, there is a value call "Total transaction lifetime timeout ". I set the value to 80.
In my application, I have a part that will trigger Spring SimpleJobLauncher to run a spring batch in my application. In my Spring batch, I have some for loop which is write some data in log file, and it does not have any interaction with DB.
I found that, my for loop will not hit the 60 seconds time out after 60 seconds. It will only hit the 80 seconds time out. I believe that it is because of it didn't call db.
My code is something as follow:
#Autowired
#Qualifier("sharedTransactionTemplate")
private TransactionTemplate transactionTemplate;
transactionTemplate.execute( new TransactionCallbackWithoutResult( ) {
// In here I trigger the spring batch
} );
I would like to edit this value to for example 70 seconds base on code in xml or any way. I do not want to edit it in WAS Console because I still want other method still using the 80 seconds.
Any ideas?
Here is what my spring batch doing:
Call db, update something. (done with no error)
reader, read data from db. (done with no error)
Before write, i got some for loop which is not call db. --> hit timeout here, I found that the timeout value is the value that set in WAS Console, instead of the value set in xml.
and so on...
I actually want to do something that I can code in xml, so that this spring batch can use my own value set in xml. SO that my step 3 can use my own value.
Additional question, are these following class only applicable for transaction that involve connection to database?
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
class="org.springframework.transaction.support.TransactionTemplate"
It is unclear to me from the info you've provided why you are executing your Spring Batch job transactionally, you may want to consider whether you need to. Although not a duplicate, this question is similar to this one in which you can see one possible solution is to start a UserTransaction for your spring batch job which you can control the timeout. As pointed out in that answer and subsequent comments, there are some limitations and considerations about using this method.

Aggregating response from asynchronous publish subscribe channel

I need to call 4 web services asynchronously and aggregate the results to a single message.If one of the service takes more time to respond than the specified timeout(3sec) then the remaining responses which have arrived should be aggregated and the late coming messages should be discarded . For this i used the below snippet in spring configuration file
<int:aggregator input-channel="aggregatorInputChannel" group-timeout="3000" send-partial-result-on-expiry="true" expire-groups-upon-completion="true" output-channel="aggregatorOutputChannel" ref="responseAggregator" method="populateResponseHeader" >
</int:aggregator>
When one of the web service(lets say service4) call takes more time than the timeout value, then the thread for service4 keeps running in the background and the server send a 202 response. Any suggestions on how i should modify my aggregator to ignore the messages which arrive later than the timeout and get the response?
First of all you should take a look into the Scatter-Gather pattern.
Looks like it is fully sufficient for your use-case.
You should use expire-groups-upon-timeout="false":
<xsd:attribute name="expire-groups-upon-timeout">
<xsd:annotation>
<xsd:documentation>
Boolean flag specifying, if a group is completed due to timeout (reaper or
'group-timeout(-expression)'), whether the group should be removed.
When true, late arriving messages will form a new group. When false, they
will be discarded. Default is 'true' for an aggregator and 'false' for a
resequencer.
</xsd:documentation>
</xsd:annotation>
<xsd:simpleType>
<xsd:union memberTypes="xsd:boolean xsd:string" />
</xsd:simpleType>

Is there a way to set a timeout for the commit-interval on a spring batch job?

We have data streaming in on an irregular basis and in quantities that I can not predict. I currently have the commit-interval set to 1 because we want data to be written as soon as we receive it. We sometimes get large numbers of items at a time (~1000-50000 items in a second) which I would like to commit in larger chunks as it takes awhile to write these individually. Is there way to set a timeout on the commit-interval?
Goal: We set the commit-interval to 10000, we get 9900 items and after 1 second it commits the 9900 items rather then waiting until it receives 100 more.
Currently, when we set the commit-interval greater than 1, we just see data waiting to be written until it hits the amount specified by the commit-interval.
How is your data streaming in? Is it being loaded to a work table? Added to a queue? Typically you'd just drain the work table or queue with whatever commit interval performs best then re-run the job periodically to check if a new batch of inbound records has been received.
Either way, I would typically leverage flow control to have your job loop and just process as many records as are ready to be processed for a given time interval:
<job id="job">
<decision id="decision" decider="decider">
<next on="PROCESS" to="processStep" />
<next on="DECIDE" to="decision" />
<end on="COMPLETED" />
<fail on="*" />
</decision>
<step id="processStep">
<!-- your step here -->
</step>
</job>
<beans:bean id="decider" class="com.package.MyDecider"/>
Then your decider would do something like this:
if (maxTimeReached) {
return END;
}
if (hasRecords) {
return PROCESS;
} else {
wait X seconds;
return DECIDE;
}

Spring Integration - parallel file processing by group

I am trying to use experiment with Spring Integration with a simple task. I have a folder where I get incoming files. The files are named after a group ID.
I want all the files in the same groupId to be processed in sequence but files with different groupIds can be processed in parallel.
I started putting together a configuration like this:
<int:service-activator input-channel="filesInChannel"
output-channel="outputChannelAdapter">
<bean class="com.ingestion.FileProcessor" />
</int:service-activator>
<int:channel id="filesInChannel" />
<int-file:inbound-channel-adapter id="inputChannelAdapter"
channel="filesInChannel" directory="${in.file.path}" prevent-duplicates="true"
filename-pattern="${file.pattern}">
<int:poller id="poller" fixed-rate="1" task-executor="executor"/>
</int-file:inbound-channel-adapter>
<int-file:outbound-channel-adapter id="outputChannelAdapter" directory="${ok.file.path}" delete-source-files="true"/>
<task:executor id="executor" pool-size="10"/>
This is processing all the incoming files with 10 threads. What are the steps I need to split the files by groupId and have them processed one thread per groupId?
Thanks.
Assuming a finite number of group ids, you could use a different adapter for each group (with a single thread; all feeding into the same channel); each with a different pattern.
Or you could create a custom FileListFilter and use some kind of thread affinity to assign files from each group to a specific thread, with the filter only returning this thread's file(s).

Resources