poller metadata configuration for SFTP inbound adapter - spring

I want to consumer 25 files at every minute from a SFTP directory, and process 2 files concurrently. What poller metadata and inbound synchronisation settings I should use ? Plus I need to delete those files which I have processed concurrently from my local store and SFTP directory.

I think i found the solution, max-messages-per-poll=2 and max-fetch-size=2): The adapter fetch both files and then emits each one.

Related

Download one file at a time through the same session in Apache Camel FTP

I want to implement following use case with Apache Camel FTP:
On a remote location I have 0 to n amount of files stored.
When I receive a command, using FTP, I want to download one file as a byte array (which one does not matter), if any files are available.
When the file is downloaded, I want to save it in a database as a blob.
Then I want to delete the stored/processed file on the remote location
Wait for the next download command and once received go back to step 1.
The files have to be downloaded through the same FTP session.
My problem is that if I use a normal FTP route, it downloads all available files.
When I tell the route to only download one, I have to create a new route for the other files and I cannot reuse the FTP session.
Is there a way to implement this use case with Apache Camel FTP?
Camel-ftp doesn't consume all available files at once it consumes them individually one after another meaning that each file gets processed separately. If you need to process them in some specific order you can try using file-name or modified date with sortBy option.
If you want to control when file gets downloaded i.e when command gets called you can call FTP Consumer endpoint using pollEnrich
Example:
// 1. Loads one file from ftp-server with timeout of 3 seconds.
// 2. logs the body and headers
from("direct:example")
.pollEnrich("ftp:host:port/directoryName", 3000)
.to("log:loggerName?showBody=true&showHeaders=true");
You can call the direct consumer endpoint with ProducerTemplate you can obtain from CamelContext or change it to whatever consumer endpoint fits your use case.
If you need to use dynamic URI you can use simple to provide the URI for poll-enrich and also also provide timeout afterwards.
from("direct:example")
.pollEnrich()
.simple("ftp:host:port/directoryName?fileName=${headers.targetFile}")
.timeout(3000)
.to("log:loggerName?showBody=true&showHeaders=true");

NiFi putFTP not efficient

I have a nifi flow that sends more than 50 files per minute using the putFTP processor. The server has limited resources, but I need to send in a faster pace. I looked at the ftp server logs (not nifi), and my conclusion:
A new ftp connection (session) is created for every file. Is there an option to configure many files on one session? (connect to port 21, authenticate once, and then send many files on different ports)
When sending one file, many CWD (Change Working Directory) commands are sent. For example, sending file to /myfiles/test/dest/file.txt:
CWD /
CWD /myfiles
CWD /
CWD /myfiles/test
CWD /
CWD /myfiles/test/dest
This is not efficient. Is there any way to improve the putFTP? Is this a bug?
First question: use run duration
A new ftp connection (session) is created for every file. Is there an
option to configure many files on one session? (connect to port 21,
authenticate once, and then send many files on different ports)
First, (if it fits your use case) you can use the MergeContent processor to merge multiple (smaller) flow files into one (bigger) flow file and feed it to PutFTP.
Second, the PutFTP processor has the SupportsBatching annotation:
Marker annotation a Processor implementation can use to indicate that
users should be able to supply a Batch Duration for the Processor. If
a Processor uses this annotation, it is allowing the Framework to
batch ProcessSessions' commits, as well as allowing the Framework to
return the same ProcessSession multiple times...
Source: https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/annotation/behavior/SupportsBatching.java
Increase the run duration of your PutFTP processor towards more throughput to use the same task for many flow files. You might want to adjust the Maximum Batch Size in the properties tab to accommodate to that change.
Read more about it here:
Dcoumentation: Run duration
Understanding NiFi processor's "Run Duration" functionality.
What should be Ideal Run-duration and Run schedule configuration in nifi processors
Second question: inspect source code
When sending one file, many CWD (Change Working Directory) commands
are sent. For example, sending file to /myfiles/test/dest/file.txt
By inspecting FTPTransfer.java you can see, that the put method does the following:
put -> get client
put -> get client -> resetWorkingDirectory -> changeWorkingDirectory(homeDirectory)
put -> setAndGetWorkingDirectory
This might be the behavior you discovered.

Talend - Read file from several FTP SERVER

I have several FTP servers (4 servers), where there are files that are generated by an application.
This application generates the same type of file with the same structure in the 4 servers.
With Talend, I want to when any change to a file in one of the servers I need to recover their data and put in in Active MQ.
What could you suggest ? Because in tFTP I don't have tWaitForFile
Staying within that architectural approach... You could poll the ftp servers to detect a change in a file's updated Timestamp or size .

Spring Integration AWS - S3-Inbound-Adapter - deleteSourceFiles and Backup files

We need to delete the files after reading it from the S3 bucket, I could not find a option. How do we delete the source file after reading the file from S3 bucket and how do I backup files after successfully reading it?
Is there way that S3-Inbound-Adapter outputs the payload to a outputchannel?
Appreciate your time and help!
Regards
Karthik
Karthik,
The S3-Inbound-Adapter works like all other file-based adapters for the remote file system. I mean FTP and SFTP. We synchronize the remote directory with the local one and pick up files from there on each poll. So, the remove operation should not be as a part of send file to output-channel process, just because the remote transfer and process are separated over the time.
I only see the option for you to delete/backup file in the end of your reading process, e.g. one more subscriber for the publish-subscriber-channel, or <transaction-synchronization on the <poller> of <int-aws:s3-inbound-channel-adapter>.
But with that you should supply somehow the AmazonS3Object properties to invoke the AmazonS3.deleteObject(String bucketName, String key) in the end...
I'm sure that we will be able to transfer those options through the MessageHeaders, so feel free to raise a JIRA on the matter!

Pulling remote file with stream or batch job in Spring-XD?

Need to pull the content from a remote file, do some optional processing, and save in a data repository. As far as the protocol for file transferring, SFTP, HTTP and FTP are the options.
Since SFTP is available as source module in Spring-XD, is it doable to use it to pull file content as the source in a stream? Since SFTP is also available in Spring Batch, which solution will be more sensible? Using Spring-XD SFTP or Spring-Batch SFTP to pull data from remote file? What is the difference between them in essence?
Some of other concerns will include parallel transfer to achieve high transfer throughput and fault tolerant (resuming from interrupted transferring).
Thanks,
t.s.tao

Resources