Spring Integration - Wait till finishes processing file - spring

MyHandler class takes about 10-20 seconds (approximately) to process a huge 200MB csv/txt file. If I drop a file in the'my.test.dir' directory, MyHandler keeps picking the same file multiple times. To avoid this, I set prevent-duplicates to false. But I might get a file with the same file name after some time. It's not picking up files with the same name later. Please suggest, how to handle this scenario? MyHandler has to wait until it finishes processing the file.
<bean id="test-file-bean" class="com.test.MyHandler"/>
<int-file:inbound-channel-adapter
id="test-adapter-inbound"
directory="${my.test.dir}"
channel="test-file-channel"
filter="test-file-filter"
prevent-duplicates="false" auto-startup="true"
auto-create-directory="true">
<int:poller fixed-delay="5"/>
</int-file:inbound-channel-adapter>
<int:service-activator
input-channel="test-file-channel" ref="test-file-bean" method="handleFlow"/>
Thanks.

Consider to use a FileSystemPersistentAcceptOnceFileListFilter to prevent duplicates, but pass those which timestamp has been changed.
See more info in docs : https://docs.spring.io/spring-integration/docs/current/reference/html/file.html#file-reading.
Over there you also can find a ChainFileListFilter if you need to combine with your own.

Related

Determine the end of a cyclic workflow in spring integration (inbound-channel => service-activator)

We have the following simple int-jpa based workflow:
[inbound-channel-adapter] -> [service-activator]
The config is like this:
<int:channel id="inChannel"> <int:queue/> </int:channel>
<int:channel id="outChannel"> <int:queue/> </int:channel>
<int-jpa:inbound-channel-adapter id="inChannelAdapter" channel="inChannel"
jpa-query="SOME_COMPLEX_POLLING_QUERY"
max-results="2">
<int:poller max-messages-per-poll="2" fixed-rate="20" >
<int:advice-chain synchronization-factory="txSyncFactory" >
<tx:advice transaction-manager="transactionManager" >
<tx:attributes>
<tx:method name="*" timeout="30000" />
</tx:attributes>
</tx:advice>
<int:ref bean="pollerAdvice"/>
</int:advice-chain>
</int-jpa:inbound-channel-adapter>
<int:service-activator input-channel="inChannel" ref="myActivator"
method="pollEntry" output-channel="outChannel" />
<bean id="myActivator" class="com.company.myActivator" />
<bean id="pollerAdvice" class="com.company.myPollerAdvice" />
The entry point for processing is a constantly growing table against which the SOME_COMPLEX_POLLING_QUERY is run. The current flow is :
[Thread-1] The SOME_COMPLEX_POLLING_QUERY will only return entries that has busy set to false (we set busy to true as soon as polling is done using txSyncFactory)
[Thread-2] These entries will pass through the myActivator where it might take anywhere from 1 min to 30 mins.
[Thread-2] Once the processing is done, we set back the busy from true to false
Problem: We need to trigger a notification even when the processing of all the entries that were present in the table is done.
Approach tried: We used the afterReturning of pollerAdvice to find out if the SOME_COMPLEX_POLLING_QUERY returned any results or not. However this method will start returning "No Entries" way before the Thread-2 is done processing all the entries.
Note:
The same entries will be processes again after 24hrs. But this time it will have more entries.
We are not using outbound-channel-adapter, since we dont have any requirement for it. However, we are open to use it, if that is a part of the solution proposed.
Not sure if that will work for you, but since you still need to wait with the notification until Thread-2, I would suggest to have some AtomicBoolean bean. In the mentioned afterReturning(), when there is no data polled from the DB, you just change the state of the AtomicBoolean to true. When the Thread-2 finishes its work, it can call <filter> to check the state of the AtomicBoolean and then really perform an <int-event:outbound-channel-adapter> to emit a notification event.
So, the final decision to emit event or not is definitely done from the Thread-2, not polling channel adapter.

How to override Transaction service timeout value in WAS Console by code?

In my xml file, I have something as follow:
<bean id="transactionManager"
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
p:defaultTimeout="60" />
<bean id="sharedTransactionTemplate"
class="org.springframework.transaction.support.TransactionTemplate">
<constructor-arg>
<ref bean="transactionManager" />
</constructor-arg>
<property name="isolationLevelName" value="${sharedTransactionTemplate.isolationlevel:ISOLATION_READ_UNCOMMITTED}"/>
<property name="timeout" value="60"/>
</bean>
With the value 60, my program will hit timeout if the response from db taking more than 60 seconds. This is correct and also what I expected.
And I found that there is some transaction time out value setting in WAS Console as well:
Server --> WebSphere application servers --> my server
Under Container Settings --> click on Container Services --> Transaction service
Inside Transaction service page, there is a value call "Total transaction lifetime timeout ". I set the value to 80.
In my application, I have a part that will trigger Spring SimpleJobLauncher to run a spring batch in my application. In my Spring batch, I have some for loop which is write some data in log file, and it does not have any interaction with DB.
I found that, my for loop will not hit the 60 seconds time out after 60 seconds. It will only hit the 80 seconds time out. I believe that it is because of it didn't call db.
My code is something as follow:
#Autowired
#Qualifier("sharedTransactionTemplate")
private TransactionTemplate transactionTemplate;
transactionTemplate.execute( new TransactionCallbackWithoutResult( ) {
// In here I trigger the spring batch
} );
I would like to edit this value to for example 70 seconds base on code in xml or any way. I do not want to edit it in WAS Console because I still want other method still using the 80 seconds.
Any ideas?
Here is what my spring batch doing:
Call db, update something. (done with no error)
reader, read data from db. (done with no error)
Before write, i got some for loop which is not call db. --> hit timeout here, I found that the timeout value is the value that set in WAS Console, instead of the value set in xml.
and so on...
I actually want to do something that I can code in xml, so that this spring batch can use my own value set in xml. SO that my step 3 can use my own value.
Additional question, are these following class only applicable for transaction that involve connection to database?
class="org.springframework.transaction.jta.WebSphereUowTransactionManager"
class="org.springframework.transaction.support.TransactionTemplate"
It is unclear to me from the info you've provided why you are executing your Spring Batch job transactionally, you may want to consider whether you need to. Although not a duplicate, this question is similar to this one in which you can see one possible solution is to start a UserTransaction for your spring batch job which you can control the timeout. As pointed out in that answer and subsequent comments, there are some limitations and considerations about using this method.

Drools loading session seems to fire rules

I am at a loss with this and can't seem to find an answer in the docs. I am observing the following behaviour. I have this rule:
import function util.CSVParser.parse;
declare Passenger
#role(event)
#expires(24h)
end
rule "Parse and Insert CSV"
when
CSVReadyEvent( $csv_location : reader ) from entry-point "CSVReadyEntryPoint";
$p : Passenger() from parse($csv_location);
then
insert( $p );
end
I can then enter my CSVReadyEvent into my session and call fireAllRules and it executes correctly. It hits the safe point at the end, and all is cool.
I then restart my app and load the session like this:
KieSession loadedKieSession = kieServices.getKieService().getStoreServices().loadKieSession(session.getId(), kieBase, ksConf, kieServices.getEnvironment());
The base and config I take from my kmodule.xml.
What happens now is that WITHOUT calling fireAllRules() loading the session somehow triggers fireing all rules.
I do not understand how unmarshalling triggers rule execution but this is obviously wrong. I have already executed that rule, and it should not be executed twice.
In a test case (my tests do NOT create persistent sessions because I only want the rules to be tested) I can call fireAllRules() twice, and the second time does not trigger any matched rules. I am not exactly sure what goes wrong, but the persistent session seems to be loaded in an odd way. Or the persisting of the session is wonky and forgets that it had executed the rule already.
Does anyone have inside in this? I am more than happy to share any code.
Here's my persistence.xml:
<persistence-unit name="org.jbpm.persistence.jpa" transaction-type="JTA">
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<class>org.drools.persistence.info.SessionInfo</class>
<class>org.drools.persistence.info.WorkItemInfo</class>
<exclude-unlisted-classes>true</exclude-unlisted-classes>
<properties>
<property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect" />
<property name="hibernate.max_fetch_depth" value="30" />
<property name="hibernate.hbm2ddl.auto" value="update" />
<property name="hibernate.show_sql" value="true" />
<property name="hibernate.transaction.jta.platform" value="org.hibernate.service.jta.platform.internal.JBossStandAloneJtaPlatform" />
</properties>
</persistence-unit>
Thanks!
An update/answer from a painful painful painful day of debugging and testing and running stuff:
I suspected my hibernate setup was wrong, so the wrong thing got persisted. I ended up throwing that approach away and writing a manual marshalling/de-marshalling thing.
After creating/loading/recreating/loading I can confirm the session NEVER changes on file.
This was interesting to me because I could swear that the rules are executed, and I was half right:
The WHEN part is executed when the session is loaded. Why? I have not the slightest idea...
I was chasing a red hearing because I am calling a function in my when part (as you can see in the rule) to iterate and insert all facts based on that event I am receiving.
My parse function obviously has logging, so each time I reload the session, I get a storm of log flying through my terminal hinting that my rules are being executed.
I then changed my rules to be very very specific (as in output everywhere I possible can). I debugged as deep as I could and I still can't seem to be able to pinpoint as to why on earth recreating the session is executing the when part of a rule. I settled on this: Magic. And with a little more detail:
The documentation of drools persistence https://docs.jboss.org/jbpm/v6.2/userguide/jBPMPersistence.html states that the guys implemented their own serialze/deserialize strategy in order to speed up the process. I resolve to blame this custom strategy on what I am seeing.
Lesson learned:
Do NOT create objects in the when part (because this will slow you down when loading a session since all when parts are executed)
Chasing red herrings is a pain in my butt.
So to sum up: I believe (up to say 99%) that loading a session is NOT executing the rules.
Using events in real mode and in a STREAM session running due to fireUntilHalt on the one hand and saving and restarting sessions with fireAllRules are somewhat contradictory paradigms.
If you have events, I suggest that you use the API to set up and start a (stateful) session in a thread, and insert facts (events) as they arrive.

Spring Integration - parallel file processing by group

I am trying to use experiment with Spring Integration with a simple task. I have a folder where I get incoming files. The files are named after a group ID.
I want all the files in the same groupId to be processed in sequence but files with different groupIds can be processed in parallel.
I started putting together a configuration like this:
<int:service-activator input-channel="filesInChannel"
output-channel="outputChannelAdapter">
<bean class="com.ingestion.FileProcessor" />
</int:service-activator>
<int:channel id="filesInChannel" />
<int-file:inbound-channel-adapter id="inputChannelAdapter"
channel="filesInChannel" directory="${in.file.path}" prevent-duplicates="true"
filename-pattern="${file.pattern}">
<int:poller id="poller" fixed-rate="1" task-executor="executor"/>
</int-file:inbound-channel-adapter>
<int-file:outbound-channel-adapter id="outputChannelAdapter" directory="${ok.file.path}" delete-source-files="true"/>
<task:executor id="executor" pool-size="10"/>
This is processing all the incoming files with 10 threads. What are the steps I need to split the files by groupId and have them processed one thread per groupId?
Thanks.
Assuming a finite number of group ids, you could use a different adapter for each group (with a single thread; all feeding into the same channel); each with a different pattern.
Or you could create a custom FileListFilter and use some kind of thread affinity to assign files from each group to a specific thread, with the filter only returning this thread's file(s).

Reading of files with max-messages-per-poll=10 and prevent-duplicates=false

I'm trying to read files from the directory. If file cannot be processed it stays there to be tried later.
<file:inbound-channel-adapter prevent-duplicates="false" id="fileInput" directory="file:${java.io.dir}/input-data" auto-create-directory="true" filter="compositeFileFilterBean"/>
<integration:poller id="poller" max-messages-per-poll="10" default="true" >
<integration:interval-trigger interval="60" time-unit="SECONDS" />
</integration:poller>
The problem is if max-messages-per-poll set to, say 10, then each poll will return exactly 10 messages, even if there is only 1 file (i.e. all 10 messages will be the same).
Yes, that would be the expected behavior with those settings.
I am not sure why you think that is wrong.
If there is a file in the directory that is not filtered by a filter (such as the one that prevents duplicates), it will be found by the poller, either within the current poll (when max-messages-per-poll is > 1) or on the next poll.
To do what you want, you would need a custom filter, that would filter a file that was previously found within your 60 seconds polling interval.
You can:
Option1. set the property "prevent-duplicates" to true in inbound-channel-adapter. This property is true by default AND in case no other filter is in place, or file-regex. If we are using a custom filter, springs understand that our custom filter will include AcceptOnceFileListFilter, so it sets prevent-duplicates to false.
Option2. complete bean "compositeFileFilterBean" with filter org.springframework.integration.file.filters.AcceptOnceFileListFilter"

Resources