Add Batch capabilities to Integration flow - spring

I currently have a Spring Integration flow that works fine (see link for diagram). I would like to add Batch on top of my current configuration to allow for retry with exponential back-off, circuit breaker pattern, and persisting jobs to the database for restart.
The Integration flow consists of a Gateway that takes a Message<MyObj>, which is eventually routed to a Transformer that converts Message<MyObj> to a Message<String>. The Aggregator then takes Message<String> and eventually releases a concatenated Message<String> (using both a size release-strategy and a MessageGroupStoreReaper with a timeout). The concatenated String is then the payload of the File uploaded using SFTP outbound-channel-adapter.
I have searched, read through docs, looked at tons of examples, and I can't figure out how to encapsulate the last step of the process into a Batch Job. I need the ability to retry uploading the String (as payload of File) if there is an SFTP connection issue or other Exception thrown during the upload. I also want to be able to restart (using database backed JobRepository) in case of some failure, so I don't think using Retry Advice is sufficient.
Please explain and help me understand how to wire together the pieces and which to use (job-launching-gateway, MessageToJobRequest Transformer, ItemReader, ItemWriter??). I'm also unsure how to access each Message<String> and send to the SFTP channel-adaptor inside of a Job, Step, or Tasklet.
Current flow:

First of all let's take a look how we can overcome your requirements without Batch.
<int-sftp:outbound-channel-adapter> has <request-handler-advice-chain>, where you can configure RequestHandlerRetryAdvice and RequestHandlerCircuitBreakerAdvice.
To achieve the restartable option you can make the input channel of that adapter as a persistent queue with message-store.
Now about Batch.
To start a Job from the Integration flow you should write some MessageToJobRequest and use <batch-int:job-launching-gateway> after that. Here, of course, you can place your payload to the jobParameters.
To send a message from Job to some channel (e.g. to sftp adapter) you can use org.springframework.batch.integration.chunk.ChunkMessageChannelItemWriter.
Read more here: http://docs.spring.io/spring-batch/reference/html/springBatchIntegration.html

Related

How to know the the running status of a spring integration flow

I have a simple integration flow that poll data based on a cron job from database, publish on a DirectChannel, then do split and transformations, and publish on another executor service channel, do some operations and finally publish to an output channel, its written using dsl style.
Also, I have an endpoint where I might receive an http request to trigger this flow, at this point I send the messages one of the mentioned channels to trigger the flow.
I want to make sure that the manual trigger doesn’t happen if the flow is already running due to either the cron job or another request.
I have used the isRunning method of the StandardIntegrationFlow, but it seems that it’s not thread safe.
I also tried using .wireTap(myService) and .handle(myService) where this service has an atomicBoolean flag but it got set per every message, which is not a solution.
I want to know if the flow is running without much intervention from my side, and if this is not supported how can I apply the atomic boolean logic on the overall flow and not on every message.
How can I simulate the racing condition in a test in order to make sure my implementation prevent this?
The IntegrationFlow is just a logical container for configuration phase. It does have those lifecycle methods, but only for an internal framework logic. Even if they are there, they don't help because endpoints are always running if you want to do them something by some event or input message.
It is hard to control all of that since it is in an async state as you explain. Even if we can stop a SourcePollingChannelAdapter in the beginning of that flow to let your manual call do do something, it doesn't mean that messages in other threads are not in process any more. The AtomicBoolean cannot help here for the same reason: even if you set it to true in the MessageSourceMutator.beforeReceive() and reset back to false in its afterReceive() when message is null, it still doesn't mean that messages you pushed down in other thread are already processed.
You might consider to use an aggregator for AtomicBoolean resetting in the end of batch since you mention that you pull data from DB, so perhaps there is a number of records per poll you can track downstream. This way your manual call could be skipped until aggregator collects results for that batch.
You also need to think about stopping a SourcePollingChannelAdapter at the moment when manual action is permitted, so there won't be any further race conditions with the cron.

SCDF. WSDL Source : Spring Cloud Task or Spring Cloud Stream or any other solution?

We have requirements for getting data from a SOAP web service, where same records are going to be exposed. Then the record is transformed and written do the DB.
We are the acitve side and at the certain intervals we are going to check if a new record has appeared.
Our main goal are:
to have a scheduler for setting intervals
to have a mechanizm to retry if something goes wrong (eg. lost connection)
to have a visual control of the process - check the places where something stuck (like dashboard in SCDF)
Since there is no sample wsdl source app, I guess the Task (or Stream ?) should be written by ourself. But what to use for repeating and scheduling...
I Need your advice in choosing the right approach.
I'm not tied to the SCDF solution if any other are more suitable.
If you intend to consume directly as SOAP messages from external services, you could either build a custom Spring Cloud Stream source or a simple Spring Batch/Spring Cloud Task application. Both the options provide the resiliency patterns, including retries.
However, if the upstream data is not real-time, you would choose the Task path because the streams are long-running and they never terminate. Tasks, on the other hand, run for a finite period of time, terminate, and free-up resources. There's also the option to use the platform-specific scheduler implementation to trigger to launch the Task on a recurring window periodically.
From the SCDF dashboard, you can design/build Composed Tasks, including the state transitions and the desired downstream operation.

spring integration reading many files

We have a requirement to parse lots of incoming files (into a directory) and processing them and putting the outcome onto AWS kinesis for each file.
The frequency of files can be 60,000 per day and files can arrive every 15 seconds. Each file may contain about 1000 entries.
Can spring-integration handle this load?
Would there be any issues processing this kind of volumes?
As the files are coming in on to an inbound-channel-adapter can we execute a service-activator for each file?
I believe we need to use task-executors on channels with poller? Any examples?
Would task-executors call the service-activators in a multi-threaded manner?
Any pointers would be helpful. Links to any code examples would be nice.
This is not the kind of question one asks here on SO - too broad and too much questions in a single thread. I assume even if I answer to all of them, you are going to ask more and SO is not good for Q&A chat. Anyway:
Yes, Spring Integration can handle this. You can use simple FileReadingMessageSource to poll the directory periodically.
Each file (an outbound message payload) can be fed to the FileSplitter to parse it line by line.
After splitter you indeed can use an ExecutorChannel to process those lines in parallel.
The Service Activator can be called in multi-threaded environment as long as it is a thread-safe.
In the end you can use KinesisMessageHandler to send record to the AWS Kinesis. And yes, this one can be used from different threads as well.
All the information you can find in the Spring Integration Reference Manual. Some Samples may help you as well. And also Spring Integration AWS Extension is there for you.

Processing a single message using Spring jms

I am working on an existing Spring jms application that pulls messages from ActiveMQ using jms:inbound-gateway. This application is a job processor that takes jobs off a queue and sends results back to the queue. Everything currently works great.
I would like to modify this application to accept one and only one job, process it, and then exit, but I have not been able to find a way to cleanly do this. The method that is called must return the results, and the results are automatically placed back onto the queue by Spring. Is there any way to tell Spring to stop accepting messages at 1? How would you know when it's done sending the reply message so you can exit.
In a more general case, if you had an application that wanted to stop accepting messages and finish processing them all so you could exit cleanly, how can that be done?
Thanks in advance for any advice.
It sounds like you should be using JmsTemplate.receive . Inbound gateway is meant for a message driven model. It sounds like while you are using a message queue for transport, you're not really doing a message driven model of processing. Obviously that's not a problem but if your aim is to process a single message you should just use JmsTemplate.
Another way to do this is to use the default message listener container, ensure threads is set to one and then make sure your activemq setup has no read-ahead setup (i.e. reads one message at a time and no more). Then, at the end of your onMessage method, call the DMLC.stop method.

Spring Batch or JMS for long running jobs

I have the problem that I have to run very long running processes on my Webservice and now I'm looking for a good way to handle the result. The scenario : A user executes such a long running process via UI. Now he gets the message that his request was accepted and that he should return some time later. So there's no need to display him the status of his request or something like this. I'm just looking for a way to handle the result of the long running process properly. Since the processes are external programms, my application server is not aware of them. Therefore I have to wait for these programms to terminate. Of course I don't want to use EJBs for this because then they would block for the time no result is available. Instead I thought of using JMS or Spring Batch. Does anyone ever had the same problem or an advice which solution would be better?
It really depends on what forms of communication your external programs have available. JMS is a very good approach and immediately available in your app server but might not be the best option if your external program is a long running DB query which dumps the result in a text file...
The main advantage of Spring Batch over "just" using JMS as an aynchronous communcations channel is the transactional properties, allowing the infrastructure to retry failed jobs, group jobs together and such. Without knowing more about your specific setup, it is hard to give detailed advise.
Cheers,
I had a similar design requirement, users were sending XML files and I had to generate documents from them. Using JMS in this case is advantageous since you can always add new instances of these processes which can consume and execute the jobs in parallel.
You can use a timer task to check status or monitor these processes. Also, you can publish a message to a JMS queue once the processes are completed.

Resources