Pulling remote file with stream or batch job in Spring-XD? - spring

Need to pull the content from a remote file, do some optional processing, and save in a data repository. As far as the protocol for file transferring, SFTP, HTTP and FTP are the options.
Since SFTP is available as source module in Spring-XD, is it doable to use it to pull file content as the source in a stream? Since SFTP is also available in Spring Batch, which solution will be more sensible? Using Spring-XD SFTP or Spring-Batch SFTP to pull data from remote file? What is the difference between them in essence?
Some of other concerns will include parallel transfer to achieve high transfer throughput and fault tolerant (resuming from interrupted transferring).
Thanks,
t.s.tao

Related

Can I delete file in Nifi after send messages to kafka?

Hi I'm using nifi as an ETL tool.
Process IMG
This is my current process. I use TailFile to detect CSV file and then send messages to Kafka.
It works fine so far, but i want to delete CSV file after i send contents of csv to Kafka.
Is there any way?
Thanks
This depends on why you are using TailFile. From the docs,
"Tails" a file, or a list of files, ingesting data from the file as it is written to the file
TailFile is used to get new lines that are added to the same file, as they are written. If you need to a tail a file that is being written to, what condition determines it is no longer being written to?
However, if you are just consuming complete files from the local file system, then you could use GetFile which gives the option to delete the file after it is consumed.
From a remote file system, you could use ListSFTP and FetchSFTP which has a Completion Strategy to move or delete.

How to read remote file using urls in Terminal

I would like to know if there is a way to read a file on a remote server using url? For example i have been shared a url to a file on a remote server, i need to read it using bash commands or any tool to retrieve data (eg: view first 50 rows and write to a file) without downloading the original file to local system.
The use case is to avoid downloading/uploading huge files located on the remote server to local systems instead access the file content using the url.
Any resources on this would help.

Spring Integration AWS - S3-Inbound-Adapter - deleteSourceFiles and Backup files

We need to delete the files after reading it from the S3 bucket, I could not find a option. How do we delete the source file after reading the file from S3 bucket and how do I backup files after successfully reading it?
Is there way that S3-Inbound-Adapter outputs the payload to a outputchannel?
Appreciate your time and help!
Regards
Karthik
Karthik,
The S3-Inbound-Adapter works like all other file-based adapters for the remote file system. I mean FTP and SFTP. We synchronize the remote directory with the local one and pick up files from there on each poll. So, the remove operation should not be as a part of send file to output-channel process, just because the remote transfer and process are separated over the time.
I only see the option for you to delete/backup file in the end of your reading process, e.g. one more subscriber for the publish-subscriber-channel, or <transaction-synchronization on the <poller> of <int-aws:s3-inbound-channel-adapter>.
But with that you should supply somehow the AmazonS3Object properties to invoke the AmazonS3.deleteObject(String bucketName, String key) in the end...
I'm sure that we will be able to transfer those options through the MessageHeaders, so feel free to raise a JIRA on the matter!

generate a CSV file in salesforce and transfer this into another server

I have generated a XML file in Salesforce and now my problem is that i want to transfer this file into another server. Can i connect to another server using FTP, or is there any way out to transfer the file to another server.
This is an urgent task.
Any solutions wud be greatfully accepted.
Phaniraj N
You can't FTP out, where do you store the file in Salesforce? What's the server on the other end?
You could serve the file up as a public page over sites, call a custom web service to transfer it, implement a web service to serve up the file, use data loader to extract from Salesforce on a regular basis and then a batch to upload to the other server, the list goes on but we'll need a bit more information on what you're dealing with!

Efficiently creating tar files

Note: I'm using Windows file servers and .NET
If I were to create a TAR file from files on a remote file server (meaning, the TAR file would be created on the remote file server, where the original files are), would the bytes need to come to my machine and then go back to the file server (since my machine is running the code that's generating the TAR), or would they stay on the file server? I'm asking about the best possible (theoretical) implementation.
Thank you!
The bytes need to be where they are processed.
If you process them on your remote system, they must be transferred.
If you process them on your server, they don't need to be transferred.
If your goal is to minimize bandwidth usage, your best bet would be to have a script on your server that will generate the tar files for you when triggered by your remote system.
The best possible implementation really depends on what your goals and constraints are.
The bytes would have to be read into your machine. The only way I know that you can just do the TARing on the remote server is to have the remote server generate the TAR. For example, you could connect via SSH and run a shell command on the remote server.
Unfortunately, in the scenario described, the TAR operation will use network bandwidth. You need to run the tar program on the file server to avoid using bandwidth.

Resources