Calculate file checksum in FTP server using Apache FtpClient - ftp

I am using FtpClient of Apache Commons Net to upload videos to FTP server. To check if the file has really been successfully transferred, I want to calculate the checksum of remote file, but unfortunately I found there is no related API I could use.
My question is: Whether there is a need to calculate file checksum in ftp server? If the answer is yes, how to get checksum in FtpClient?
If the answer is no, how do FtpClient know if the file has really been successfully and completely transferred?

With FTP, I'd recommend to verify the upload, if possible.
The problem is that there's no widespread standard API for calculating checksum with FTP.
There are many proposals for checksum calculation command for FTP. None were accepted yet.
The latest proposal is:
https://datatracker.ietf.org/doc/html/draft-bryan-ftpext-hash-02
As a consequence, different FTP servers support different checksum commands, with a different syntax. HASH, XSHA1, XSHA256, XSHA512, XMD5, MD5, XCRC, to name some. You need to check what, and if any at all, your FTP server supports.
You can test that with WinSCP. The WinSCP supports all the previously mentioned commands. Test its checksum calculation function or checksum scripting command. If they work, enable logging and check what command and what syntax WinSCP uses against your server.
> 2015-04-28 09:19:16.558 XSHA1 /test/file.dat
< 2015-04-28 09:19:22.778 213 a98faefdb2c36ca352a2d9b01668aec6b641cf4b
Then execute the command using Apache Commons Net sendCommand method:
if (FTPReply.isPositiveCompletion(ftpClient.sendCommand("XSHA1", "filename"))
{
String[] reply = ftpClient.getReplyStrings();
}
(I'm the author of WinSCP)
If your server does not support any of the checksum commands, you do not have many options:
Download the file back and check it locally.
When using encryption (TLS/SSL), chances of the file being corrupted during transfer are significantly lower. The receiving party (server in this case) would otherwise fail to decrypt the data. So if you are sure that the file transfer completed (no decryption errors and the size of the uploaded file is the same as size of the original local file), you can be pretty sure that the uploaded file is correct.

Just a addition of how I implemented this. When dealing with standard ftp servers without any additionak modules loaded for checksum checking, all i did was creating a list of MD5 CRC hashes about each file into an SFV file. Say its called uploads.sfv (just in the same format as sfv generator would do). This allows you to do further checksum checks.
Examples about the server side support checksum checking support:
PZS-ng for cuftpd, glftpd
mod_digest for ProFTPD
Of course as #MartinPrikryl highlighted, none of these are standardized.

That's a long shot, but if the server supports php, you can exploit that.
Save the following as a php file (say, check.php), in the same folder as your name_of_file.txt file:
<? php
echo md5_file('name_of_file.txt');
php>
Then, visit the page check.php, and you should get the md5 hash of your file.
Related questions:
FTP: copy, check integrity and delete
How to perform checksums during a SFTP file transfer for data integrity?
https://serverfault.com/q/98597/401691

Related

How to download a CSV from a HTTPS URL to file using Pentaho Data Integration - Spoon (Kettle)?

When googling this question, it seems to have been asked, and partially (and poorly) answered a number of times, mostly for older versions.
Question: How can I download a CSV to a local file, with the below constraints? I'm designing in Spoon.
URL: Will always be the same. https://example.com/data/my.csv . The website prepares the csv and provides it back to the web client as a file download after about 4-5 seconds. In a browser this means it is downloaded as a .csv, and not displayed.
Authentication: The website does not require authentication for access. The data isn't sensitive.
Local file path: The downloaded CSV will overwrite the existing csv. eg: d:\data\my.csv . Ie, I can set this on a timer and have it download the newest csv every hour or so.
Proxy: It is quite likely I will need to traverse a network proxy. eg badproxy.mynetwork.internal:8080 and that proxy requires a username and password. It's far better if I can set this password in a single location so any future things created can reference it. Not really sure on how to approach this either.
The rest of my process focuses on addressing the content of the csv, and already works fine.
The processes I've found on google show using the Http Client component, though it's not particularly straightforward how this translates into a file being saved locally into a known location.
Thanks for any pointers.
PDI v9.0.0.0-423
The HTTP client step needs to be triggered. Use a Row generator step generating e.g. 1 empty row and link that with a hop to the HTTP client step.
for your solution , try this:
Data Grid -->HTTP Client-->CSV File Input->Text file output(extension with csv)

Generating an MD5 for flute software

Hi I have a software that is called MAD FLUTE, It send files from a server to several clients using multicast,it uses FEC, the problem that I have is that when I want to send a list of files in the form of an FDT_TSI.xml it adds the MD5 of each file for the receiver to check if the file are correct.
Since I'm doing the FDT file with a script I haven't found a way to generate the MD5 like the software
Content-MD5="c+kuA8jR7esYd1k1PZgVJw==" even if I use openssl-md5 to generate it
What can I do?

How to verify if upload is finished in SFTP [duplicate]

This question already has answers here:
How to confirm SFTP file delivery?
(3 answers)
Closed 1 year ago.
I'm uploading the file through Sftp to destination server using bash scripts.
How I can be sure that the file which is uploaded is complete upload in the case sftp will not return anything or network connection could be broken?
I see that I can get the size of the file before uploading to the server and then I can compare it with the existing size for the file on the server.
Perhaps you can mention about other better options?
Thank you.
I think getting the size is a good option.
What I could imagine :
Client side :
- Put the size of the file, and its md5 in a file, like ".fileinfo"
- Send the fileinfo to the server
- Send the (interesting) File to the server
Server side :
- Check periodically files of a folder (with "watch ls" command for example)
- If a ".fileinfo" exists, read it, and check if the size corresponds to an existing file of the same name (without ".filefome"). If the size corresponds, do an "md5sum" of the file, and check if it corresponds. If yes, move your file into your final destination folder, and delete the ".fileinfo" file. If not reiterate.
Many sites for downloading softwares will provide both the software and its checksum.
we can use the same technique to check our uploading file.
upload the file together with its checksum, on the server side compare the file's checksum with uploaded checksum,
if the two don't match, you will know
The file uploaded is corrupted, or
The checksum uploaded is corrupted, or
Both the checksum and file uploaded are corrupted.
Test exit code of sftp. If it returns 0 you can be pretty sure that everything is ok (assuming you are using OpenSSH sftp). This works only when you use -b switch (what I assume you are doing).
SFTP protocol allows checksum calculation, but I suppose you are stuck with OpenSSH (or either or both sides) that does not support this.
To be 100% sure, you can download the file back and compare with original.

Efficiently creating tar files

Note: I'm using Windows file servers and .NET
If I were to create a TAR file from files on a remote file server (meaning, the TAR file would be created on the remote file server, where the original files are), would the bytes need to come to my machine and then go back to the file server (since my machine is running the code that's generating the TAR), or would they stay on the file server? I'm asking about the best possible (theoretical) implementation.
Thank you!
The bytes need to be where they are processed.
If you process them on your remote system, they must be transferred.
If you process them on your server, they don't need to be transferred.
If your goal is to minimize bandwidth usage, your best bet would be to have a script on your server that will generate the tar files for you when triggered by your remote system.
The best possible implementation really depends on what your goals and constraints are.
The bytes would have to be read into your machine. The only way I know that you can just do the TARing on the remote server is to have the remote server generate the TAR. For example, you could connect via SSH and run a shell command on the remote server.
Unfortunately, in the scenario described, the TAR operation will use network bandwidth. You need to run the tar program on the file server to avoid using bandwidth.

How do I verify the integrity of a Sybase dump file, without trying to load it?

Here's the scenario - a client uploads a Sybase dump file to (gzipped) to our local FTP server. We have an automated process which picks these up and then moves them to different server within the network where the database server resides. Unfortunately, this transfer is over a WAN, which for large files takes a long time, and sometimes our clients forget to FTP in binary mode, which results in 10GB of transfer over our WAN all for nothing as the dump file can't be loaded at the other end. What I'd like to do, is verify the integrity of the dump file on the local server before sending it out over the WAN, but I can't just try and "load" the dump file, as we don't have Sybase installed (and can't install it). Are there any tools or bits of code that I can use to do this?
There are a few things you can do from the command line. The first, on the sending side, is to generate md5sum's of the files.
$ md5sum *.dmp
2bddf3cd8b04010183dd3295ce7594ff pubs_1.dmp
7510e0250c8d68bae3e0e794c211e60b pubs_2.dmp
091fe54fa5fd81d8c109cc7835d37f4a pubs_3.dmp
On the client side, they can run the same. Secondly, usually Sybase dumps are done with the compress option. If this option is used, you can also test the file integrity by uncompressing the files via the command line. This isn't as complete, but it will verify the 8 byte CRC-32 checksum which is part of the compress algorithm.
$ gunzip --test *.dmp
gunzip: pubs_3.dmp: unexpected end of file
Neither of these methods validate that Sybase will be able to load the file, but it does help ensure the file isn't corrupt.
There is no way to really verify the integrity of the dump file without loading it in some way by a backup server. The client should know whether the dump is successful or not via the backup log or output during the dump.
But to solve your problem you should use to SFTP or SCP, all transfers are done in binary, alleviating your problem.
Ensure that they are also using compression in the dump a value of 1-3 is more than enough, this should reduce your network traffic also.

Resources