We have a spring integration application which will monitor on an incoming folder then process the files.
When the application is down for maintenance or some other reason the incoming folder is filled with 100K files by upstream application.
When restart application it is getting frozen it is not processing incoming files may be trying to load all the incoming files.
Here is configuration
<file:inbound-channel-adapter id="inFiles" channel="inFilesin" directory="file:${incoming.folder}"
queue-size="300" filename-regex="(?i)^(?!.*writing) " auto-startup="true" auto-create-directory="false" >
<int:poller id="fw.fileInboudPoller" fixed-rate="1" receive-timeout="3" time-unit="SECONDS"
max-messages-per-poll="10" task-executor="taskExecutor" />
</file:inbound-channel-adapter>
<task:executor id="taskExecutor" pool-size="10-20" queue-capacity="20" rejection-policy="CALLER_RUNS" />
Appreciate your help.
Thanks,
Mohan
Suggest change fixed-rate to fixed-delay.
Your files are processed very slow and the first option sais that the new task should be started just afer that time (in your case 1 sec.).
Another problem - rejection-policy="CALLER_RUNS". In this casem if your thead queue will be exhausted (and it is in you case of 100K files), the scheduled thread does the real work.
Poller, to schedule tasks, uses ThreadPoolTaskScheduler with size 10. So, with this "havy-load" your app may be frozen, because that pool is shared for all application.
So, try to use fixed-delay. In this case your app won't be frozen, but files will be processed slower.
Maybe this can help you: <int:resource-inbound-channel-adapter> ?
Related
Versions:
Spring: 5.2.16.RELEASE
Spring Integrations: 5.3.9.RELEASE
macOS Big Sur: 11.6
For a fuller account of my spring-integration configuration, see this question I posted yesterday.
To sum up, I have set up this channel for polling changes to a directory:
<int-file:inbound-channel-adapter id="channelIn" directory="${channel.dir}" auto-create-directory="false" use-watch-service="false" filter="channelFilter" watch-events="CREATE,MODIFY">
<int-file:nio-locker ref="channelLocker"/>
<int:poller fixed-delay="${channel.polling.delay}" max-messages-per-poll="${channel.polling.maxmsgs}"></int:poller>
</int-file:inbound-channel-adapter>
It works fine. However, there does not appear to be a configuration option to start the polling after application-start by some arbitrary delay. In my case, I don't think there is any program error (yet) in starting the polling service immediately after Tomcat container starts my war-file. But it is also true that there is quite a bit going on during the application-start, and my preference would be to defer the inception of the polling service some time after the bean for SourcePollingChannelAdapter is created.
Is there anyway to do this in Spring?
There are (at least) a couple of options:
Instead of fixed-delay, use the trigger property to point to a PeriodicTrigger bean with an initialDelay (and fixedDelay).
Set auto-startup="false" and start the adapter manually either directly, or using a control bus.
https://docs.spring.io/spring-integration/docs/current/reference/html/system-management.html#control-bus
I have a Spring boot MVC and batch application. Both the batch and MVC share the DAO and Service layers so they are in the same war file. They are deployed into 4 cloud servers and there is a load balance and vip configured for the UI application. So the MVC application is fine.
The problem is as part of the batch i do FTP of a file to an external server and that external server FTPs the processed file back. The processed file comes back only to one among the 4 servers. So I want the batch to run only on 1 server. How do i suppress the batch from executing in the other servers.
Solution becomes easier as your 4 instances are running on 4 different cloud severs. The starting point of the batch can be a file poller. So if the file is dropped into the polled directory on server 1, the batch job on server 1 will be invoked. The other instances do nothing as there is no file dropped on that server.
You need to integrate file poller before spring batch. Something like this - http://docs.spring.io/spring-batch/reference/html/springBatchIntegration.html
<int:channel id="inboundFileChannel"/>
<int:channel id="outboundJobRequestChannel"/>
<int:channel id="jobLaunchReplyChannel"/>
<int-file:inbound-channel-adapter id="filePoller"
channel="inboundFileChannel"
directory="file:/tmp/myfiles/"
filename-pattern="*.csv">
<int:poller fixed-rate="1000"/>
</int-file:inbound-channel-adapter>
<int:transformer input-channel="inboundFileChannel"
output-channel="outboundJobRequestChannel">. <bean class="io.spring.sbi.FileMessageToJobRequest">
<property name="job" ref="personJob"/>
<property name="fileParameterName" value="input.file.name"/>
</bean>
</int:transformer>
<batch-int:job-launching-gateway request-channel="outboundJobRequestChannel"
reply-channel="jobLaunchReplyChannel"/>
This can be one of many approaches but a way to achieve is keep a value in property file and set it's value to Boolean true
Now handle your batch to run only if property file value is true.
This way it gives you flexibility to change the server you want to handle batch job.
running latest Spring 4.1.0 and spring batch 3.0.1
Uisng
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
<property name="transactionManager" ref="transactionManager" />
</bean>
<bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>
I have a job that executes every few seconds, it's a very basic ETL job, check if Db has some data transforms and pushes to another system. If nothing to be done it tries on next run.
I have noticed that memory consumption keeps going up on HasMap objects even if the job has nothing to do. So is MapJobRepositoryFactoryBean keeping state in memory? I assume it is... I tried with the persisted job repository also and the memory consumption stays low.
For operational purposes and simplicity don't care about the job history and previous states of jobs so I don't really need a persisted repository. I just want something to run every few seconds.
So is there another way to clean up the state of MapJobRepositoryFactoryBean automatically or not to track full state and keep memory low?
Yes, the map based job repository will keep the state in memory for ever. As the javadoc states (http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/core/repository/support/MapJobRepositoryFactoryBean.html), that JobRepository implementation really isn't intended for production use. If you want to have a memory based JobRepository that you can clean up, use an in memory database and execute clean up scripts periodically.
I'm working on abstracting out any sort of messaging framework for some code I'm working on. Basically, I'm using a combination of Spring AOP and Spring Integration to generate messages without the Java code knowing anything about RabbitMQ, JMS, or even Spring Integration. That said, what I'm using to generate the messages is contained in its own .jar, and it re-used by several other areas of the application. I currently have the messaging system set up such that the channels on which messages are sent are specified by the code that calls the system (i.e., channels are generated automatically based on the external method invocation) by specifying the channel name in the message header and using a header-value router to create the channels if they don't exist. My issue comes in on the endpoint of these channels - the intention of the current structure is to allow Spring to change to any messaging structure as requirements specify or change. I know how to take a static channel and use outbound channel converters/gateways to send it to a pre-specified RabbitMQ/JMS queue and process from there; what I'm struggling with is how to tell Spring that I need every channel created by the router to have a RabbitMQ (or whatever other messaging system gets implemented) outbound channel adapter that's dynamically generated based on the channel name since we don't know channel names beforehand.
Is this possible? And if not, would you mind providing input as to what could perhaps be a better way?
Thanks ahead of time!
Here's a basic template of what my config file looks like - I have an initial channel ("messageChannel") which gets sent to a publish-subscribe-channel and queuing channel depending on one of the message headers and is routed from there.
<!--Header value based channel configurations-->
<int:channel id="messageChannel" />
<int:channel id="queue" />
<int:publish-subscribe-channel id="topic" />
<!--Header-based router to route to queue or topic channels-->
<int:header-value-router input-channel="messageChannel"
header-name="#{ T(some.class.with.StringConstants).CHANNEL_TYPE}" />
<!--Re-routes messages according to their destination and messaging type-->
<int:header-value-router input-channel="queue"
header-name="#{ T(some.class.with.StringConstants).MESSAGE_DESTINATION}" />
<int:header-value-router input-channel="topic"
header-name="#{ T(some.class.with.StringConstants).MESSAGE_DESTINATION}" />
<!--AOP configuration - picks up on any invocation of some.class.which.generates.Messages.generateMessage()
from a Spring-managed context.-->
<aop:config>
<aop:pointcut id="eventPointcut"
expression="execution(* some.class.which.generates.Messages.generateMessage(..))" />
<aop:advisor advice-ref="interceptor" pointcut-ref="eventPointcut"/>
</aop:config>
<int:publishing-interceptor id="interceptor" default-channel="messageChannel">
<int:method pattern="generateMessage" payload="#return" channel="messageChannel" />
</int:publishing-interceptor>
See the dynamic-ftp sample; it uses a dynamic router that creates new outbound endpoints/channels on demand.
I am using Spring Integration to poll a directory for a File, process this file in a service class, write this file to an output directory and then delete the original file.
I have the following XML configuration:
<int-file:inbound-channel-adapter id="filesInChannel"
directory="file:${java.io.tmpdir}/input"
auto-create-directory="true" >
<int:poller id="poller" fixed-delay="1000" />
</int-file:inbound-channel-adapter>
<int:service-activator id="servicActivator"
input-channel="filesInChannel"
output-channel="filesOut"
ref="my_file_processing_service">
</int:service-activator>
<int-file:outbound-channel-adapter id="filesOut" auto-create-directory="true" delete-source-files="true" directory="file:${java.io.tmpdir}/output"/>
This polls the file, passes it to my processing_service and copies it to the outbound directory. However the original file is not being deleted. Does anyone have any idea as to why not?
I know that the question was asked a long time ago but maybe the answer will be useful to someone else.
The reason why the input file is not deleted is provided in the Spring Integration Reference:
The delete-source-files attribute will only have an effect if the
inbound Message has a File payload or if the FileHeaders.ORIGINAL_FILE
header value contains either the source File instance or a String
representing the original file path.
Your message does not contain this particular header. If you use one of the standard file transformers (FileToStringTransformer and FileToByteArrayTransformer) it will be set automatically. Alternatively you can set it manually using a header enricher.
Behind the scenes something like this is happening in the file transformers:
...
Message<?> transformedMessage = MessageBuilder.withPayload(result)
.copyHeaders(message.getHeaders())
.setHeaderIfAbsent(FileHeaders.ORIGINAL_FILE, file)
.setHeaderIfAbsent(FileHeaders.FILENAME, file.getName())
.build();
...
From the documentation http://static.springsource.org/spring-integration/reference/html/files.html
<int-file:outbound-gateway id="mover" request-channel="moveInput"
reply-channel="output"
directory="${output.directory}"
mode="REPLACE" delete-source-files="true"/>
I don't know how to do this on the inbound-channel-adapter(which I think makes sense)