I have a spring batch application which reads from file , does some processing and finally write a customized output. This all happens in one step. In next step i have a tasklet which archives the input files (move to another folder). This application works fine.But, now i got a requirement to sftp output files on a remote servers where they would further processed. I got a way to sftp using spring integration where i have created a input channel which feeds to outboundchannel adapter. I put my files as payload in message and send messages to channel. The only problem i see here is that everytime I have to get the context i eed to load the spring config file, which seems kind of hackish way to do the task. Does anyone know about any way to integrate SI with SB.
Let me know if you want to see my config...
Thanks in Advance !!
code to access the same app-context without loading the spring config again
public class AppContextProvider implements ApplicationContextAware{
private static ApplicationContext ctx;
public ApplicationContext getApplicationContext() {
return ctx;
}
public void setApplicationContext(ApplicationContext appContext) throws BeansException {
ctx = appContext;
}
}
code to push the output file to sftp server
log.info("Starting transfer of outputFile : " + absoluteOutputFileName);
final File file = new File(absoluteOutputFileName);
final Message<File> message = MessageBuilder.withPayload(file).build();
AppContextProvider context = new AppContextProvider();
final MessageChannel inputChannel = context.getApplicationContext().getBean("toChannel",MessageChannel.class);
inputChannel.send(message);
log.info("transfer complete for : " + absoluteOutputFileName);
Take a look at the spring-batch-integration module within the Spring Batch project. In there, we have components for launching jobs via messages. In your situation, you'd FTP the file down then have the JobLaunchingMessageHandler launch the job.
You can also watch this video of a talk I co-presented at SpringOne a couple years ago on this topic: https://www.youtube.com/watch?v=8tiqeV07XlI
As Michael said, you'll definitely want to look at and leverage spring-batch-integration. We actually use Spring Integration as a wrapper of sorts to launch 100% of our Spring Batch jobs.
One use case we've found particularly useful is leveraging the spring-integration-file Inbound Channel Adapters to poll staging directories to indicate when a new batch file has landed. As the poller finds a new file, we then launch a new batch job using the input filename as a parameter.
This has been a real help when it comes to restartability, because we now have one job instance per file as opposed to having a job kick off at arbitrary intervals and then partition across however many files happen to be in the staging folder. Now if an exception occurs during processing, you can target a specific job for restart immediately rather than waiting for 99 of the 100 "good" files to finish first.
Related
I am new to Spring Integration and have read quite some documentation and other topics here on StackOverflow. But I am still a bit overwhelmed on how to apply the newly acquired knowledge in a Spring Boot Application.
This is what should happen:
receive message from a Kafka topic, eg from "request-topic" (payload is a custom Job POJO). InboundChannelAdapter?
do some preparation (checkout from a git repo)
process files using a batch job
commit&push to git, update Job object with commit-id
publish message to Kafka with updated Job object, eg to "reply-topic". OutboundChannelAdapter?
Using DSL or plain Java configuration does not matter. My problem after trying several variants is that I could not achieve the desired result. For example, handlers would be called too early, or not at all, and thus the reply in step 5 would not be updated.
Also, there should only be one flow running at any given time, so I guess, a queue should be involved at some point, probably at step 1(?).
Where and when should I use QueueChannels, DirectChannel (or any other?), do I need GatewayHandlers, eg to reply with a commit-id?
Any hints are appreciated.
Something like this:
#Bean
IntegrationFlow flow() {
return IntegrationFlows.from(Kafka.inboundGateway(...))
.handle(// prep)
.transform(// to JobLaunchRequest)
.handle(// JobLaunchingGateway)
.handle(// cleanUp and return result)
.get();
}
It will only process one request at a time (with default concurrency).
I am new to spring integration framework. Currently i am working on a project which has a requirement to download the files to a local directory.
My goal is to complete the below task
1.Download the files by suing spring integration to a local directory
2.Trigger a batch job.It means to read the file and extract a specific column information.
I am able to connect to SFTP server.But facing difficulty how to use spring integration java DSL to download the files and trigger a batch job.
Below code to connect to SFTP Session Factory
#Bean
public SessionFactory<ChannelSftp.LsEntry> sftpSessionFactory() {
DefaultSftpSessionFactory factory = new DefaultSftpSessionFactory(true);
factory.setHost(sftpHost);
factory.setPort(sftpPort);
factory.setUser(sftpUser);
if (sftpPrivateKey != null) {
factory.setPrivateKey(sftpPrivateKey);
factory.setPrivateKeyPassphrase(privateKeyPassPhrase);
} else {
factory.setPassword("sftpPassword");
}
factory.setPassword("sftpPassword");
logger.info("Connecting to SFTP Server" + factory.getSession());
System.out.println("Connecting to SFTP Server" + factory.getSession());
factory.setAllowUnknownKeys(true);
return new CachingSessionFactory<ChannelSftp.LsEntry>(factory);
}
Below code to download the files from remote to local
#Bean
public IntegrationFlowBuilder integrationFlow() {
return IntegrationFlows.from(Sftp.inboundAdapter(sftpSessionFactory()));
}
I am using spring integration dsl. i am not able to get what to code here.
I am trying many possible ways to do this.But not able to get how to proceed with this requirement.
Can anyone one help me how to approach at this and if possible share me a sample code for reference?
The Sftp.inboundAdapter() produces messages with a File as a payload. So, having that IntegrationFlows.from(Sftp.inboundAdapter(sftpSessionFactory())) you can treat as a first task done.
Your problem from here that you don't make an integrationFlow, but rather return that IntegrationFlowBuilder and register it as a #Bean. That's where it doesn't work for you.
You need to continue a flow definition and call its get() in the end to return an integrationFlow instance which already has to be registered as a bean. If this code flow is confusing a bit, consider to implement an IntegrationFlowAdapter as a #Component.
To trigger a batch job you need consider to use a FileMessageToJobRequest in a .transform() EIP-method and then a JobLaunchingGateway in a .handle() EIP-method.
See more info in docs:
https://docs.spring.io/spring-integration/reference/html/dsl.html#java-dsl
https://docs.spring.io/spring-integration/reference/html/sftp.html#sftp-inbound
https://docs.spring.io/spring-batch/docs/4.3.x/reference/html/spring-batch-integration.html#spring-batch-integration-configuration
BTW, the last one has a flow sample exactly for your use-case.
I learn Spring Cloud Task and I write simple application that is divided into 3 services. First is a TaskApplication that have only main() and implements CommandLineRunner, second is a TaskIntakeApplication that receives request and send them to RabbitMQ, third service is an TaskLauncherApplication that receives messages from RabbitMQ and runs the task with received parameters.
#Component
#EnableBinding(Source.class)
public class TaskProcessor {
#Autowired
private Source source;
public void publishRequest(String arguments) {
final String url = "maven://groupId:artifatcId:jar:version";
final List<String> args = Arrays.asList(arguments.split(","));
final TaskLaunchRequest request = new TaskLaunchRequest(url, args, null, null, "TaskApplication");
final GenericMessage<TaskLaunchRequest> message = new GenericMessage<>(request);
source.output().send(message);
}
}
And as you can see I call my built artifact by giving maven url but I wonder how can I call artifact from another docker container?
If you intend to launch a task application from an upstream event (e.g., a new file event; a new DB record event; a new message in Rabbit event, etc.,), you'd simply use the respective out-of-the-box applications and then launch the task via the Task Launcher.
Follow this example on how the 3-steps are orchestrated via SCDF's DSL.
Perhaps you could consider reusing the existing apps instead of reinventing them unless you have got a completely different requirement and that these apps cannot meet it. I'd suggest trying to get the example mentioned above working locally before you consider extending the behavior.
2 questions on spring batch , Can someone please shed more light on this.
1) I have implemented registerShutdownHook in my spring batch project, but when I do kill my batch process it is not stopping immediately. It is waiting till the entire batch process is completed. Thats how it works?
public static void main(String[] args) {
final AbstractApplicationContext appContext = new AnnotationConfigApplicationContext("\"com.lexisnexis.batch\",\"com.lexisnexis.rules\"");
appContext.registerShutdownHook();
...
}
Does it needs to stop all running batches and when we do kill with this registerShutdownHook code?
2) What is the best way to restart all stopped jobs?
Yes, that's how it works. It's needed to close spring application context gracefully. Please check spring framework's documentation.
To restart your stopped job you need to invoke jobOperator.restart(executionId) or use jobLauncher.run with same parameters you used to start original job.
I've implemented a simple Spring Batch solution that reads data from an XML file and writes it to a database. The solution works well but due to scalability requirements I'm converting this simple implementation to a more scalable implementation that uses Remote Chunking. I've run into an issue attempting to access the JobExecutionContext from the remote ItemWriter. My ItemWriter is shown below (non essentials removed for brevity) and works fine in the non remote chunking implementation.
public Object myObject;
public void write(List<? extends Employee> items) throws Exception
{
// do stuff
// use myObject here
}
public void beforeStep(StepExecution stepExecution_p)
{
myObject = stepExecution_p.getJobExecution().getExecutionContext().get("myObject");
}
However when I run the remote chunking implementation the beforeStep method above isn't invoked and so I don't have a handle on myObject when the write method is later invoked. I understand that the before method is not being invoked before the step begins because the ItemReader and ItemWriter are distributed and running in separate JVMs, therefore there is no way for the beforeStep to be invoked on the remote ItemWriter.
I need to find a way of getting a handle on the steExecutionContext in my remote ItemWriter so that I can get myObject from that context. Is there a way in which I can get a handle on the stepExecutionContext in the remote Item Writer or any other way of passing useful data into the remote ItemWriter? I looked at late binding to the remote ItemWriter but apparently step scoping of remote components (ItemWriter in this case) is not supported.
<bean id="itemWriterSlave" class="com.....ItemWriterSlave" scope="step">
<property name="myObject" value="#{stepExecutionContext[myObject]}" />
</bean>
Does anyone know how I can access data passed from previous step (via the stepExecutionContext) in a remote ItemWriter? Any suggestions would be greatly appreciated.
Spring Batch provides two different ways to do multi-JVM scaling: remote partitioning and remote chunking. Remote Partitioning, while the slave steps are not part of the job that contains the master step, they are still true steps that have all the rights and responsibilities of a Spring Batch step (access to the job repository, slaves get StepExecutions, etc).
However, remote chunking is different. The slaves are not actually true steps. They are really just remotely deployed components used by the master. Because of that, they don't have StepExecutions or related ExecutionContexts. The best way to pass data from previous steps to slaves in remote chunking is to persist it somewhere outside of the job.