Spring Integration AWS - S3-Inbound-Adapter - deleteSourceFiles and Backup files - spring

We need to delete the files after reading it from the S3 bucket, I could not find a option. How do we delete the source file after reading the file from S3 bucket and how do I backup files after successfully reading it?
Is there way that S3-Inbound-Adapter outputs the payload to a outputchannel?
Appreciate your time and help!
Regards
Karthik

Karthik,
The S3-Inbound-Adapter works like all other file-based adapters for the remote file system. I mean FTP and SFTP. We synchronize the remote directory with the local one and pick up files from there on each poll. So, the remove operation should not be as a part of send file to output-channel process, just because the remote transfer and process are separated over the time.
I only see the option for you to delete/backup file in the end of your reading process, e.g. one more subscriber for the publish-subscriber-channel, or <transaction-synchronization on the <poller> of <int-aws:s3-inbound-channel-adapter>.
But with that you should supply somehow the AmazonS3Object properties to invoke the AmazonS3.deleteObject(String bucketName, String key) in the end...
I'm sure that we will be able to transfer those options through the MessageHeaders, so feel free to raise a JIRA on the matter!

Related

Transferring multiple files though Informatica FTP connection

I have a requirement to generate the target file in Informatica with date/time appended to it. How will the Informatica FTP connection identify such dynamic file name with date appended to its name?
Also I would like to know if it is possible to FTP multiple files at a time via Informatica FTP connection. Please someone help me on this.
Its actually pretty simple, you just have to use the part of the file name that is constant and then place a *
for eg:
Myfile_20190607.txt
Myfile_20190507.txt
If i specify Myfile_2019* , this is good enough to pickup the files soecified above. You may have to play with the * and criteria to fit the files that you need.
Note: if you are sending files to third party, try to use SFTP instead of plain old ftp and most organization blocked ftp to outside ip's.
As far as I know until Informatica 9.x, it is neither possible to generate dynamic filename nor create multiple files using FTP connection. Only option was to create the files on Informatica server and then run a script to FTP them over to the destination server.
Here is how:
edit your workflow, choose variables tab, create a workflow
variable with datatype NSTRING; assume the variable name is
$wf_timestamp;
create an assignment task and assign TO_CHAR(SYSDATE,'YYYYMMDD')
to the variable in the assignment task;
edit session: Choose Mapping tab, choose your target; then
Connections; then edit FTP Value; then in the Remote Filename
attribute, enter your filename with the timestemp, eg,
myfile_$$$wf_timestamp.csv;
put your assignment before you session in your workflow.
that's it.

Spring batch job start processing file not fully uploaded to the SFTP server

I have a spring-batch job scanning the SFTP server at a given interval. When it finds a new file, it starts the processing.
It works fine for most cases, but there is one case when it doesn't work:
User starts uploading a new file to the SFTP server
Batch job checks the server and finds a new file
It start processing it
But since the file is still being uploaded, during the processing it encounters unexpected end of input block, and the error occurs.
How can I check that file was fully uploaded to the SFTP server before batch job processing starts?
Locking files while uploading / Upload to temporary file name
You may have an automated system monitoring a remote folder and you want to prevent it from accidentally picking a file that has not finished uploading yet. As majority of SFTP and FTP servers (WebDAV being an exception) do not support file locking, you need to prevent the automated system from picking the file otherwise.
Common workarounds are:
Upload “done” file once an upload of data files finishes and have
the automated system wait for the “done” file before processing the
data files. This is easy solution, but won’t work in multi-user
environment.
Upload data files to temporary (“upload”) folder and move them atomically to target folder once the upload finishes.
Upload data files to distinct temporary name, e.g. with .filepart extension, and rename them atomically once the upload finishes. Have the automated system ignore the .filepart files.
Got from here
We had similar problem, Our solution was, we configured spring-batch cron trigger to trigger the job every 10min(though we could configure for 5min, as file transfer was taking less than 3min), then we read/process all the files created prior to 10 minutes. We assume the FTP operation completes within 3 minutes. This gave us some additional flexibility such as when spring-batch app was down etc.
For example if the batch job triggered at 10:20AM we read all the files that were created before 10:10AM, like-wise job that runs at 10:30, reads all the files created before 10:20.
Note: Once Read you need to either delete or move to history folder for duplicate reads.

int-file:outbound-gateway ignored duplicate file name in `outbound-gateway ` memory state?

Thanks for attention, i using Spring Integration in my project, i want to retrieve files from servers into tmp folder by int-ftp:inbound-channel-adapterand move files to orginal forder by int-file:outbound-gateway for future batch processing, but i feel when file name is duplicate int-file:outbound-gateway not working for me and does not transmit the file and seems ignore them, how to solving this my problem.
<int-file:outbound-gateway id="tmp-mover"
request-channel="ready-to-process-inbound-tmp-mover"
reply-channel="ready-to-process-inbound"
directory="${backupRootPath}/ali/in//"
mode="REPLACE" delete-source-files="true"/>
Set the local-filter in the ftp inbound channel adapter to an AcceptAllFileListFilter. By default, it's an AcceptOnceFileListFilter.

What's the best way to (programatically) determine a file's network origin?

For an application I'm writing, i want to programatically find out what computer on the network a file came from. How can I best accomplish this?
Do I need to monitor network transactions or is this data stored somewhere in Windows?
When a file is copied to the local system Windows does not keep any record of where it was copied. So unless the application that created it saved such information in the file then it will be lost.
With file auditing file and directory operations can be tracked, but I don't think that will include the source path with file copies (just who created it and when).
Yes, it seems like you would either need to detect the file transfer based on interception of network traffic, or if you have the ability to alter the file in some way, use public key cryptography to sign files using a machine-specific key before they are transferred.
Create a service on either the destination computer, or on the file hosting computers which will add records to an Alternate Data Stream attached to each file, much the way that Windows handles ZoneInfo for files downloaded from the internet.
You can have a background process on machine A which "tags" each file as having been tagged by machine A on such-and-such a date and time. Then when machine B downloads the file, assuming we are using NTFS filesystems, it can see the tag from A. Or, if you can't have a process at the server, you can use NTFS streams on the "client" side via packet sniffing methods as others have described. The bonus here is that future file-copies will retain the data as long as it is between NTFS systems.
Alternative: create a requirement that all file transfers must be done through a Web portal (as opposed to network drag-and-drop). Built in logging. Or other type of file retrieval proxy. Do you have control over procedures such as this?

Efficiently creating tar files

Note: I'm using Windows file servers and .NET
If I were to create a TAR file from files on a remote file server (meaning, the TAR file would be created on the remote file server, where the original files are), would the bytes need to come to my machine and then go back to the file server (since my machine is running the code that's generating the TAR), or would they stay on the file server? I'm asking about the best possible (theoretical) implementation.
Thank you!
The bytes need to be where they are processed.
If you process them on your remote system, they must be transferred.
If you process them on your server, they don't need to be transferred.
If your goal is to minimize bandwidth usage, your best bet would be to have a script on your server that will generate the tar files for you when triggered by your remote system.
The best possible implementation really depends on what your goals and constraints are.
The bytes would have to be read into your machine. The only way I know that you can just do the TARing on the remote server is to have the remote server generate the TAR. For example, you could connect via SSH and run a shell command on the remote server.
Unfortunately, in the scenario described, the TAR operation will use network bandwidth. You need to run the tar program on the file server to avoid using bandwidth.

Resources