Prevent duplicates across restarts in spring integration - spring

I have to poll a directory and write entries to rdbms.
I wired up a redis metadatstore for duplicates check. I see that the framework updates the redis store with entries for all files in the folder [~ 140 files], much before the rdbms entries gets written. At the time of application termination, rdbms has logged only 90 files. On application restart no more files are picked from folder.
Properties: msgs.per.poll=10, polling.interval=2000
How can I ensure entries to redis are made after writing to db, so that both are in sync and I don't miss any files.
<code>
<task:executor id="executor" pool-size="5" />
<int-file:inbound-channel-adapter channel="filesIn" directory="${input.Dir}" scanner="dirScanner" filter="compositeFileFilter" prevent-duplicates="true">
<int:poller fixed-delay="${polling.interval}" max-messages-per-poll="${msgs.per.poll}" task-executor="executor">
</int:poller>
</int-file:inbound-channel-adapter>
<int:channel id="filesIn" />
<bean id="dirScanner" class="org.springframework.integration.file.RecursiveLeafOnlyDirectoryScanner" />
<bean id="compositeFileFilter" class="org.springframework.integration.file.filters.CompositeFileListFilter">
<constructor-arg ref="persistentFilter" />
</bean>
<bean id="persistentFilter" class="org.springframework.integration.file.filters.FileSystemPersistentAcceptOnceFileListFilter">
<constructor-arg ref="metadataStore" />
</bean>
<bean name="metadataStore" class="org.springframework.integration.redis.metadata.RedisMetadataStore">
<constructor-arg name="connectionFactory" ref="redisConnectionFactory"/>
</bean>
<bean id="redisConnectionFactory" class="org.springframework.data.redis.connection.jedis.JedisConnectionFactory" p:hostName="localhost" p:port="6379" />
<int-jdbc:outbound-channel-adapter channel="filesIn" data-source="dataSource" query="insert into files values (:path,:name,:size,:crDT,:mdDT,:id)"
sql-parameter-source-factory="spelSource">
</int-jdbc:outbound-channel-adapter>
....
</code>

Artem is correct, you might as well extend the RedisMetadataStore and flush the entries that are not in your database on initialization time, this way you could use Redis and be in sync with the DB. But this kind of couples things a little.

How can I ensure entries to redis are made after writing to db
It's isn't possible, because FileSystemPersistentAcceptOnceFileListFilter works before any message sending and only once, when FileReadingMessageSource.toBeReceived is empty. Of course, it tries to refetch files on the next application restart, but it can't do that because your RedisMetadataStore already contains entries for those files.
I think we don't have in your case any choice unless use some custom JdbcFileListFilter based on your files table. Fortunately you logic ends up with file entry anyway.

Related

Spring Integration - Message Driven Channel Adapter Configuration To Reduce Consuming Interval

I have a queue to listen messages which comes from a service like this:
<bean id="inboundQueue" class="com.rabbitmq.jms.admin.RMQDestination">
<constructor-arg value="myQueue"/>
<constructor-arg value="true"/>
<constructor-arg value="false"/>
</bean>
Then, I have a message driven channel adapter like this, I can't use inbound-channel-adapter because my system has no tolerance about delay. Message forwarding and timeliness is so important.
<int-jms:message-driven-channel-adapter id="inboundAdaptor"
auto-startup="true"
connection-factory="jmsConnectionFactory"
destination="inboundQueue"
channel="requestChannel"
error-channel="errorHandlerChannel"
receive-timeout="1000" />
When I started the project and checked RabbitMQ console, I saw that Get (empty) was around 10/s. Increasing or decreasing the receive-timeout property has no affect on it.
When I stopped the project, this Get (empty) is 0. I understood that the value of Get (empty) field totally about my message-driven-channel-adapter. And I guess that that time is about consuming interval about message-driven-channel-adapter.
What can I do to reduce this time?

Apache Ignite cache write is visible to other client after a delay

We have a 8 node Ignite cluster on production. Below is the cache configuration for one of the caches.
<bean id="cache-template-bean" abstract="true"
class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="inputDataCacheTemplate*"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="backups" value="1"/>
<property name="atomicityMode" value="ATOMIC"/>
<property name="dataRegionName" value="dr.prod.input"/>
<property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
<property name="writeSynchronizationMode" value="PRIMARY_SYNC"/>
<property name="statisticsEnabled" value="true"/>
<property name="affinity">
<bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
<property name="partitions" value="256"/>
</bean>
</property>
<property name="expiryPolicyFactory">
<bean class="javax.cache.expiry.CreatedExpiryPolicy" factory-method="factoryOf">
<constructor-arg>
<bean class="javax.cache.expiry.Duration">
<constructor-arg value="DAYS"/>
<constructor-arg value="7"/>
</bean>
</constructor-arg>
</bean>
</property>
</bean>
We are seeing a strange behaviour. It is as follows
Application A writes a record to cache
Application B tries to read that record
Application B is unable to find record in cache, so it inserts new one thereby wiping the data entered by Application A
3 happens very rarely. There are 1000 such cache miss for about 50M events we receive daily.
Gap between 1 and 2 is more than 20ms at least.
We tried putting a code in Application B where on first cache miss we wait for about 20ms. Now we could reduce those misses by a great margin. But still there were some misses. The fact that app B could read same record it could not find after a delay means that app A is not failing in record insertion, nor there is some other network factor which is impacting inserts nor it is because eviction or expiry. It is also ensured that for 1 and 2 key used for put and get operations is same.
What could be going on here? Please help.
I think it's more likely that you have a race condition.
Application B tries to read that record
Application A writes a record to cache
Application B is unable to find record in cache, so it inserts new one thereby wiping the data entered by Application A
Clients generally go to the primary partition to retrieve data, so it's incredibly unlikely that applications A and B are seeing different data.
The traditional way of dealing with this is with transactions, which would also work in Ignite.
Better might be using different APIs. For example, there's IgniteCache#getAndPutIfAbsent() and IgniteCache#putIfAbsent(), both of which do the check and write atomically without needing transactions.

Spring Integration AWS s3-inbound-streaming-channel-adapter stream from multiple s3 buckets

I am using XML based spring integration and use s3-inbound-streaming-channel-adapter to stream from a single s3 bucket.
We now have a requirement to stream from two s3 buckets.
So is it possible for s3-inbound-streaming-channel-adapter to stream from multiple buckets?
Or would I need to create a separate s3-inbound-streaming-channel-adapter for each s3 bucket?
This is my current set up for a single s3 bucket and it does work.
<int-aws:s3-inbound-streaming-channel-adapter
channel="s3Channel"
session-factory="s3SessionFactory"
filter="acceptOnceFilter"
remote-directory-expression="'bucket-1'">
<int:poller fixed-rate="1000"/>
</int-aws:s3-inbound-streaming-channel-adapter>
Thanks in advance.
UPDATE:
I ended up having two s3-inbound-streaming-channel-adapter as mentioned by Artem Bilan below.
However, for each inbound adapter, I had to declare instances of acceptOnceFilter and metadataStore separately.
This is because if I only had one instance of acceptOnceFilter and metadataStore and these were shared the the two inbound adapters, then some weird looping started happening.
e.g. When a file_1.csv arrived on bucket-1 and got processed and then if you put the same file_1.csv on bucket-2 then weird looping started happening. Don't know why! So I ended up creating acceptOnceFilter and metadataStore for each inbound adapter.
`
<!-- ===================================================== -->
<!-- Region 1 s3-inbound-streaming-channel-adapter setting -->
<!-- ===================================================== -->
<bean id="metadataStore" class="org.springframework.integration.metadata.SimpleMetadataStore"/>
<bean id="acceptOnceFilter"
class="org.springframework.integration.aws.support.filters.S3PersistentAcceptOnceFileListFilter">
<constructor-arg index="0" ref="metadataStore"/>
<constructor-arg index="1" value="streaming"/>
</bean>
<int-aws:s3-inbound-streaming-channel-adapter id="s3Region1"
channel="s3Channel"
session-factory="s3SessionFactory"
filter="acceptOnceFilter"
remote-directory-expression="'${s3.bucketOne.name}'">
<int:poller fixed-rate="1000"/>
</int-aws:s3-inbound-streaming-channel-adapter>
<int:channel id="s3Channel">
<int:queue capacity="50"/>
</int:channel>
<!-- ===================================================== -->
<!-- Region 2 s3-inbound-streaming-channel-adapter setting -->
<!-- ===================================================== -->
<bean id="metadataStoreRegion2" class="org.springframework.integration.metadata.SimpleMetadataStore"/>
<bean id="acceptOnceFilterRegion2"
class="org.springframework.integration.aws.support.filters.S3PersistentAcceptOnceFileListFilter">
<constructor-arg index="0" ref="metadataStoreRegion2"/>
<constructor-arg index="1" value="streaming"/>
</bean>
<int-aws:s3-inbound-streaming-channel-adapter id="s3Region2"
channel="s3ChannelRegion2"
session-factory="s3SessionFactoryRegion2"
filter="acceptOnceFilterRegion2"
remote-directory-expression="'${s3.bucketTwo.name}'">
<int:poller fixed-rate="1000"/>
</int-aws:s3-inbound-streaming-channel-adapter>
<int:channel id="s3ChannelRegion2">
<int:queue capacity="50"/>
</int:channel>
`
That's correct, the current implementation supports only a single remote directory to poll periodically. We really are working at this very moment to formalize such a solution as an out-of-the-box feature. Similar request has been reported for the (S)FTP support, especially when the target directory is not know in advance during configuration.
If that is not a big deal for your to configure several channel adapters for each for the directory, that would be great. You always can send messages from them to the same channel for processing.
Otherwise you can consider to loop the list of buckets via:
<xsd:attribute name="remote-directory-expression" type="xsd:string">
<xsd:annotation>
<xsd:documentation>
Specify a SpEL expression which will be used to evaluate the directory
path to where the files will be transferred
(e.g., "headers.['remote_dir'] + '/myTransfers'" for outbound endpoints)
There is no root object (message) for inbound endpoints
(e.g., "#someBean.fetchDirectory");
</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
in some bean.

(Spring batch) Pollable channel with replies contains ChunkResponses even if JOB is succefully completed

I have following chunk writer configuration with getting the replies from spring batch remote chunking:
<bean id="chunkWriter" class="org.springframework.batch.integration.chunk.ChunkMessageChannelItemWriter" scope="step">
<property name="messagingOperations" ref="messagingGateway" />
<property name="replyChannel" ref="masterChunkReplies" />
<property name="throttleLimit" value="5" />
<property name="maxWaitTimeouts" value="30000" />
</bean>
<bean id="messagingGateway" class="org.springframework.integration.core.MessagingTemplate">
<property name="defaultChannel" ref="masterChunkRequests" />
<property name="receiveTimeout" value="2000" />
</bean>
<!-- Remote Chunking Replies From Slave -->
<jms:inbound-channel-adapter id="masterJMSReplies"
destination="remoteChunkingRepliesQueue"
connection-factory="remoteChunkingConnectionFactory"
channel="masterChunkReplies">
<int:poller fixed-delay="10" />
</jms:inbound-channel-adapter>
<int:channel id="masterChunkReplies">
<int:queue />
<int:interceptors>
<int:wire-tap channel="loggingChannel"/>
</int:interceptors>
</int:channel>
My remotely chunked step is running perfectly, all data are processed with very good performance, all steps ends in COMPLETED state. But problem is that masterChunkReplies queue channel contains ChunkResponses after end of the job. Documentation doesn't say anything about it, is that normal state?
Problem is that I can't run a new job then, because it then crashes at:
Message contained wrong job instance id ["
+ jobInstanceId + "] should have been [" + localState.getJobId() + "]."
There is a simple workaround, cleaning the masterChunkReplies queue channel at the start of the job, but I'm not sure if it is correct...
Can you please clarify this?
Gary, I found the root cause.
At slaves, if I change following chunk consumer JMS adapter:
<jms:message-driven-channel-adapter id="slaveRequests"
connection-factory="remoteChunkingConnectionFactory"
destination="remoteChunkingRequestsQueue"
channel="chunkRequests"
concurrent-consumers="10"
max-concurrent-consumers="50"
acknowledge="transacted"
receive-timeout="5000"
idle-task-execution-limit="10"
idle-consumer-limit="5"
/>
for
<jms:inbound-channel-adapter id="jmsRequests" connection-factory="remoteChunkingConnectionFactory"
destination="remoteChunkingRequestsQueue"
channel="chunkRequests"
acknowledge="transacted"
receive-timeout="5000"
>
<int:poller fixed-delay="100"/>
</jms:inbound-channel-adapter>
then it works, masterChunkReplies queue is consumed completely at the end of job. Anyway, any attempts of consuming chunkRequests at slaves in parallalel doesn't work. MasterChunkReplies queue then contains not consumed ChunkResponses. So starting new jobs ends in
Message contained wrong job instance id ["
+ jobInstanceId + "] should have been [" + localState.getJobId() + "]."
Gary, does it mean that slaves cannot consume ChunkRequests in parallel?
Gary, after few days of struggling, I made it finally to work, ...Even with parallel ChunkRequests consuming at slaves and with empty masterChunkReplies pollable channel at the end of the job...Changes:
At master, I changed the polled inbound channel adapter consuming ChunkResponses just taken from github examples, for message driven adapter with the same level of multithreading as slaves are consuming ChunkRequests. Because I had a feeling that master is consuming ChunkResponses slowely, that's why there were additional ChunkResponses at the end of the job.
Also I misconfigured remotely chunked step....My fault.
I didn't test it yet at more then one nodes, but now I think it works as it should be.
Thank you very much for help.
regards
Tomas

How is ordering preserved in ActiveMQ?

I have set up an application to listen to an ActiveMQ topic. Here's the way I have configured it:
<jms:listener-container connection-factory="jmsFactory"
container-type="default" destination-type="durableTopic" client-id="CMY-LISTENER"
acknowledge="transacted">
<jms:listener destination="CMY.UPDATES"
ref="continuingStudiesCourseUpdateListener" subscription="CMY-LISTENER" />
</jms:listener-container>
<bean id="jmsFactoryDelegate" class="org.apache.activemq.ActiveMQConnectionFactory">
<property name="brokerURL" value="${jmsFactory.brokerURL}" />
<property name="redeliveryPolicy">
<bean class="org.apache.activemq.RedeliveryPolicy">
<property name="maximumRedeliveries" value="10" />
<property name="initialRedeliveryDelay" value="60000" />
<property name="redeliveryDelay" value="60000" />
<property name="useExponentialBackOff" value="true" />
<property name="backOffMultiplier" value="2" />
</bean>
</property>
</bean>
The problem I have is this:
I put 10 messages into the topic.
If the first message is read, and the application fails to process the task, it rolls back the message.
1 minute later, it retries to read the first message and process it. It fails and rolls back.
2 minutes later, it retries, and rolls back.
4 minutes later... etc
It gets stuck on the first message and the next 9 messages don't get read until the first one is dealt with.
Is this the way a topic is supposed to work? Is there a way that I can have my 9 other messages read while the first one is waiting to be re-tried?
Its working just like its supposed to, that's the nature of transactional message processing. You can't process the other messages until the first one completes or is discarded based on the rules in your given redelivery policy.
Might want to read though the JMS tutorial here:

Resources