Talend - Read file from several FTP SERVER - ftp

I have several FTP servers (4 servers), where there are files that are generated by an application.
This application generates the same type of file with the same structure in the 4 servers.
With Talend, I want to when any change to a file in one of the servers I need to recover their data and put in in Active MQ.
What could you suggest ? Because in tFTP I don't have tWaitForFile

Staying within that architectural approach... You could poll the ftp servers to detect a change in a file's updated Timestamp or size .

Related

Golang file and folder replication / mirroring across multiple servers

Consider this scenario. In a load-balanced environment, I have 3 separate instances of a CMS running on 3 different physical servers. These 3 separate running instances of the application is sharing the same database.
On each server, the CMS has a /media folder where all media subfolders and files reside. My question is how I'd implement/code a file replication service/functionality in Golang, so when a subfolder or file is added/changed/deleted on one of the servers, it'll get copied/replicated/deleted on all other servers?
What packages would I need to look in to, or perhaps you have a small code snippet to help me get started? That would be awesome.
Edit:
This question has been marked as "duplicate", but it is not. It is however an alternative to setting up a shared network file system. I'm thinking that keeping a copy of the same file on all servers, synchronizing and keeping them updated might be better than sharing them.
You probably shouldn't do this. Use a distributed file system, object storage (ala S3 or GCS) or a syncing program like btsync or syncthing.
If you still want to do this yourself, it will be challenging. You are basically building a distributed database and they are difficult to get right.
At first blush you could checkout something like etcd or raft, but unfortunately etcd doesn't work well with large files.
You could, on upload, also copy the file to every other server using ssh. But then what happens when a server goes down? Or what happens when two people update the same file at the same time?
Maybe you could design it such that every file gets a unique id (perhaps based on the hash of its contents so you can safely dedupe) and those files can never be updated or deleted, only added. That would solve the simultaneous update problem, but you'd still have the downtime problem.
One approach would be for each server to maintain an append-only version log when a file is added:
VERSION | FILE HASH
1 | abcd123
2 | efgh456
3 | ijkl789
With that you can pull every file from a server and a single number would be sufficient to know when a file is added. (For example if you think Server A is on version 5, and you get informed it is now on version 7, you know you need to sync 2 files)
You could do this with a database table:
ID | LOCAL_SERVER_ID | REMOTE_SERVER_ID | VERSION | FILE HASH
Which you could periodically poll and do your syncing via ssh or http between machines. If a server was down you could just retry until it works.
Or if you didn't want to have a centralized database for this you could use a library like memberlist. The local meta data for each node could be its version.
Either way there will be some amount of delay between a file was uploaded to a single server, and when it's available on all of them. Handling that well is hard, which is why you probably shouldn't do this.

generate a CSV file in salesforce and transfer this into another server

I have generated a XML file in Salesforce and now my problem is that i want to transfer this file into another server. Can i connect to another server using FTP, or is there any way out to transfer the file to another server.
This is an urgent task.
Any solutions wud be greatfully accepted.
Phaniraj N
You can't FTP out, where do you store the file in Salesforce? What's the server on the other end?
You could serve the file up as a public page over sites, call a custom web service to transfer it, implement a web service to serve up the file, use data loader to extract from Salesforce on a regular basis and then a batch to upload to the other server, the list goes on but we'll need a bit more information on what you're dealing with!

Syncing a file from a client to a server

I'm trying to keep a file updated real time with the server. Its more like a real time syncing which has a very small delay. Is there any application that lets me do this? Or would you suggest me using a local host as a server?
I dont know how you are connected to your server - but i assume this will be something like SCP / SFTP / FTP and i dont know your OS. WinSCP will do excatly this what you need, you can set it to watch your Filesystem (to a specified folder) and it will update the server files as soon as your file on your drive changes.
It also supports command line features so that you can use it within your own applications.

What's the best way to (programatically) determine a file's network origin?

For an application I'm writing, i want to programatically find out what computer on the network a file came from. How can I best accomplish this?
Do I need to monitor network transactions or is this data stored somewhere in Windows?
When a file is copied to the local system Windows does not keep any record of where it was copied. So unless the application that created it saved such information in the file then it will be lost.
With file auditing file and directory operations can be tracked, but I don't think that will include the source path with file copies (just who created it and when).
Yes, it seems like you would either need to detect the file transfer based on interception of network traffic, or if you have the ability to alter the file in some way, use public key cryptography to sign files using a machine-specific key before they are transferred.
Create a service on either the destination computer, or on the file hosting computers which will add records to an Alternate Data Stream attached to each file, much the way that Windows handles ZoneInfo for files downloaded from the internet.
You can have a background process on machine A which "tags" each file as having been tagged by machine A on such-and-such a date and time. Then when machine B downloads the file, assuming we are using NTFS filesystems, it can see the tag from A. Or, if you can't have a process at the server, you can use NTFS streams on the "client" side via packet sniffing methods as others have described. The bonus here is that future file-copies will retain the data as long as it is between NTFS systems.
Alternative: create a requirement that all file transfers must be done through a Web portal (as opposed to network drag-and-drop). Built in logging. Or other type of file retrieval proxy. Do you have control over procedures such as this?

Efficiently creating tar files

Note: I'm using Windows file servers and .NET
If I were to create a TAR file from files on a remote file server (meaning, the TAR file would be created on the remote file server, where the original files are), would the bytes need to come to my machine and then go back to the file server (since my machine is running the code that's generating the TAR), or would they stay on the file server? I'm asking about the best possible (theoretical) implementation.
Thank you!
The bytes need to be where they are processed.
If you process them on your remote system, they must be transferred.
If you process them on your server, they don't need to be transferred.
If your goal is to minimize bandwidth usage, your best bet would be to have a script on your server that will generate the tar files for you when triggered by your remote system.
The best possible implementation really depends on what your goals and constraints are.
The bytes would have to be read into your machine. The only way I know that you can just do the TARing on the remote server is to have the remote server generate the TAR. For example, you could connect via SSH and run a shell command on the remote server.
Unfortunately, in the scenario described, the TAR operation will use network bandwidth. You need to run the tar program on the file server to avoid using bandwidth.

Resources