cwRsync, network drive, file modification time issue - windows

I'm using cwRsync 5.4.1 x86 Free under windows and trying to sync folder to network drive.
I execute following command:
rsync.exe -rLtv --delete --ignore-errors "/cygdrive/d/1/" "/cygdrive/z/ZipNB/"
Where D is local drive and Z is network drive (external HDD connected to router, RT-N16)
Executing it several times gives same result:
>rsync.exe -rLtv --delete --ignore-errors "/cygdrive/d/1/" "/cygdrive/z/ZipNB/"
sending incremental file list
./
1.pdf
sent 11,893,922 bytes received 38 bytes 1,829,840.00 bytes/sec
total size is 11,890,918 speedup is 1.00
I have one file in the folder and it sends its content each execution. File is same each time and was not changed in the middle.
If I add additional parameter --size-only it works as expected:
>rsync.exe -rLtv --delete --ignore-errors --size-only "/cygdrive/d/1/" "/cygdrive/z/ZipNB/"
sending incremental file list
./
sent 72 bytes received 22 bytes 188.00 bytes/sec
total size is 11,890,918 speedup is 126,499.13
DIR for both directories:
D:\1>dir
Volume in drive D is XXX
Volume Serial Number is XXXX-XXX
Directory of D:\1
08.12.2016 10:04 <DIR> .
08.12.2016 10:04 <DIR> ..
24.11.2016 18:31 11 890 918 1.pdf
1 File(s) 11 890 918 bytes
Z:\ZipNB>dir
Volume in drive Z is BackUp (at Portable)
Volume Serial Number is XXXX-XXX
Directory of Z:\ZipNB
08.12.2016 10:04 <DIR> .
08.10.2016 20:40 <DIR> ..
24.11.2016 18:31 11 890 918 1.pdf
1 File(s) 11 890 918 bytes
I'm not sure but as I'm aware of rsync by default makes verification of file by modification time and size. Both files seems identical. But it seems like cwRsync for some reason gets/sets wrong modification date on file at Z drive. cwRsync works correctly if both directories are on local drive. It happens only with network drive.
In windows properties there is difference in modification time in 1 second, which can cause the problem.
I took only 1 file as an example only to simplify output, the situation is same heaving any amount of different files. It always sends full content of each file.
What can be wrong here, and how I can fix it?

I'm guessing the HDD on the network share uses FAT, because from File Times:
For example, the resolution of create time on FAT is 10 milliseconds,
while write time has a resolution of 2 seconds and access time has a
resolution of 1 day, so it is really the access date.
That would explain the time difference.
And for this kind of reason rsync added the --modify-window option:
-#, --modify-window
When comparing two timestamps, rsync treats the timestamps as being equal if they differ by no more than the modify-window value.
The default is 0, which matches just integer seconds. If you specify a
negative value (and the receiver is at least version 3.1.3) then
nanoseconds will also be taken into account. Specifying 1 is useful
for copies to/from MS Windows FAT filesystems, because FAT represents
times with a 2-second resolution (allowing times to differ from the
original by up to 1 second).
So try to add -#1 to your command.

Related

Copying file in Windows 10 changes its size

I copied a large file to a new directory in Windows 10 by dragging the file from Explorer to a folder in Eclipse. The file size of the copied file changed even though fc shows the original and new files as identical. The original file has a size of 209,715,200 bytes (200 MiB):
c:\>dir c:\Users\GeoffAlexander\Documents\Python\200MiB.txt
Volume in drive C is Windows
Volume Serial Number is 0447-709A
Directory of c:\Users\GeoffAlexander\Documents\Python
08/13/2019 09:42 AM 209,715,200 200MiB.txt
1 File(s) 209,715,200 bytes
0 Dir(s) 268,331,835,392 bytes free
The new file has a size of 211,812,352 bytes:
c:\>dir c:\Users\GeoffAlexander\Desktop\200MiB.txt
Volume in drive C is Windows
Volume Serial Number is 0447-709A
Directory of c:\Users\GeoffAlexander\Desktop
08/15/2019 09:11 AM 211,812,352 200MiB.txt
1 File(s) 211,812,352 bytes
0 Dir(s) 268,232,798,208 bytes free
The fc command shows the files as being identical:
c:\>fc c:\Users\GeoffAlexander\Documents\Python\200MiB.txt c:\Users\GeoffAlexander\Desktop\200MiB.txt
Comparing files C:\USERS\GEOFFALEXANDER\DOCUMENTS\PYTHON\200MiB.txt and C:\USERS\GEOFFALEXANDER\DESKTOP\200MIB.TXT
FC: no differences encountered
Why does the copied file get a new size? How can two files with different sizes be identical? Is Windows 10 incorrectly reporting the size of the new file?
I'm running Windows 10 Enterprise Build 1809 (OS Build 17763.615) if that makes any difference.
It turns out the file size change wasn't due to the copying of the file. Rather the file size change occurred when checking in the file to RTC (Rational Team Concert). The RTC check in was converting existing LF line delimiters into CRLF line delimiters (Windows line delimiters). See RTC
File content types and line delimiters for details.

What is the fastest way to move data from one volume to another with MapR?

I want to move data from one volume to another. The folders and file sizes vary. Files can be up to 100 GB, but we can have also a lot of small files. If there is data in the destination volume at that particular folder, it can be overwritten.
So far, I tried (Code has been simplified for demonstration purposes)
(1)for root, directories, files in os.walk(src):
for file in files:
mv -v <src> <dest>
(2)hadoop distcp -overwrite -m100 <src> <dest>
Less than 10 GB, the mv option is faster. At 10 GB both options take approx 2 minutes transfer time.

Rsync include or exclude directories using text file

I'm using rsync to backup some data from a remote host.
this is how I'm using the rsync cmd:
rsync --dry-run -avhi -e ssh --include-from=/home/rsync_list/test.txt root#10.10.4.61:/ /mnt/BACKUP/my_BACKUP/
this is the file /home/rsync_list/test.txt
+ /usr/acs/conf/**
+ /usr/acs/bin/**
+ /raid0/opmdps/TEMP_folder/**
- *
I want to copy only the listed folders excluding the remaining files.
I always get
receiving file list ... done
sent 103 bytes received 48 bytes 302.00 bytes/sec
total size is 0 speedup is 0.00 (DRY RUN)
Could you tell me what I'm doing wrong? How should I write the rsync command if I would like to sync, for example, only /raid0/opmdps/TEMP_folder/ without its subfolders?
I wonder if you really only tried with the command you posted?
Not only you are using "--dry-run" option, even the output indicates this:
total size is 0 speedup is 0.00 (DRY RUN)
Please consult the manpage:
-n, --dry-run perform a trial run with no changes made
https://linux.die.net/man/1/rsync
May I suggest you give it a run without --dry-run?

file's date changes after zip in and out again, according to XCOPY

So, here's the problem: I have files which are regular files, and they are put into a ZIP file (see below for details on ZIP). Then I unzip them (see below for details on the tool used), and the files are restored. The date of the file is restored, as in standard in the ZIP/UNZIP tools used. When querying using DIR, or in Windows Explorer, the files involved have the same date as they had, before being handled by the ZIP/UNZIP process.
So, all OK.
But then, I'm using the XCOPY /D command, to further manipulate different copies of those files on the disk ... and, XCOPY says : one file is newer than the other one. Given the fact the date, hour, up until minutes is the same .. the difference would be regarding a smaller entity, like seconds ?
All involved disks have NTFS file system.
Example:
C:\my>dir C:\windows\Background_mycomputer.cmd C:\my\directory\Background_mycomputer.cmd
Volume in drive C is mycomputerC
Volume Serial Number is 1234-5678
Directory of C:\windows
31/12/2014 19:50 51 Background_mycomputer.cmd
1 File(s) 51 bytes
Directory of C:\my\directory
31/12/2014 19:50 51 Background_mycomputer.cmd
1 File(s) 51 bytes
0 Dir(s) 33.655.316.480 bytes free
C:\my>xcopy C:\windows\Background_mycomputer.cmd C:\my\directory\Background_mycomputer.cmd /D
Overwrite C:\my\directory\Background_mycomputer.cmd (Yes/No/All)? y
C:\windows\Background_mycomputer.cmd
1 File(s) copied
C:\my>xcopy C:\my\directory\Background_mycomputer.cmd C:\windows\Background_mycomputer.cmd /D
0 File(s) copied
C:\my>xcopy C:\windows\Background_mycomputer.cmd C:\my\directory\Background_mycomputer.cmd /D
0 File(s) copied
C:\my>unzip -v
UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.
Compiled with Microsoft C 13.10 (Visual C++ 7.1) for
Windows 9x / Windows NT/2K/XP/2K3 (32-bit) on Apr 20 2009.
UnZip special compilation options:
ASM_CRC
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
NTSD_EAS
SET_DIR_ATTRIB
TIMESTAMP
UNIXBACKUP
USE_EF_UT_TIME
USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
UNICODE_SUPPORT [wide-chars] (handle UTF-8 paths)
MBCS-support (multibyte character support, MB_CUR_MAX = 1)
LARGE_FILE_SUPPORT (large files over 2 GiB supported)
ZIP64_SUPPORT (archives using Zip64 for large files supported)
USE_BZIP2 (PKZIP 4.6+, using bzip2 lib version 1.0.5, 10-Dec-2007)
VMS_TEXT_CONV
[decryption, version 2.11 of 05 Jan 2007]
UnZip and ZipInfo environment options:
UNZIP: [none]
UNZIPOPT: [none]
ZIPINFO: [none]
ZIPINFOOPT: [none]
C:\my>ver
Microsoft Windows [Version 6.1.7601]
C:\my>zip -?
Copyright (c) 1990-2006 Info-ZIP - Type 'zip "-L"' for software license.
Zip 2.32 (June 19th 2006). Usage:
zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
The default action is to add or replace zipfile entries from list, which
can include the special name - to compress standard input.
If zipfile and list are omitted, zip compresses stdin to stdout.
-f freshen: only changed files -u update: only changed or new files
-d delete entries in zipfile -m move into zipfile (delete files)
-r recurse into directories -j junk (don't record) directory names
-0 store only -l convert LF to CR LF (-ll CR LF to LF)
-1 compress faster -9 compress better
-q quiet operation -v verbose operation/print version info
-c add one-line comments -z add zipfile comment
-# read names from stdin -o make zipfile as old as latest entry
-x exclude the following names -i include only the following names
-F fix zipfile (-FF try harder) -D do not add directory entries
-A adjust self-extracting exe -J junk zipfile prefix (unzipsfx)
-T test zipfile integrity -X eXclude eXtra file attributes
-! use privileges (if granted) to obtain all aspects of WinNT security
-R PKZIP recursion (see manual)
-$ include volume label -S include system and hidden files
-e encrypt -n don't compress these suffixes
C:\my>
Question: I do not want XCOPY to make updates where I know they are invalid cause the time format is doing something wrong. How do I prevent that ?
From how I see, there's different things involved, being XCOPY, very specific ZIP and UNZIP, and NTFS file system. Which one is doing something wrong ?
I must stress that apart from ZIP and UNZIP, there are no other changes done to the file, like changing 1 file, then making a change to another one, in less than 60 seconds time.
At moment of test, the time shown was NOT the current time, and not close to it either. No file is adjusting to the current time, the times refer to last changes of the file in question, which may be any time in the past. In this case, it's one day later, but it can be anything.
I noticed the peculiar behavior Raymond Chen describes when writing a Powershell script (GitHub link) to freshen a zip archive using the System.IO.Compression and System.IO.Compression.FileSystem libraries.
Interestingly, Zip archives can store multiple copies of the same file with identical metadata (name, relative path, modification dates). Extracting the second copy of the file will fail in Windows Explorer because the file already exists.
When trying to prevent re-zipping a file was already archived, I checked the relative path and date, and noticed that there was a discrepancy of up to two seconds in the LastWriteTime. This workaround compensates for the loss of precision:
$AlreadyArchivedFile = ($WriteArchive.Entries | Where-Object {#zip will store multiple copies of the exact same file - prevent this by checking if already archived.
(($_.FullName -eq $RelativePath) -and ($_.Length -eq $File.Length) ) -and
([math]::Abs(($_.LastWriteTime.UtcDateTime - $File.LastWriteTimeUtc).Seconds) -le 2) #ZipFileExtensions timestamps are only precise within 2 seconds.
})
Also, the IsDaylightSavingTime flag is not stored in the Zip archive. As a result I was surprised when extracted files became an hour newer than the original archived file. I tried this several times and saw the extracted file's timestamp incremented by an hour every time it was compressed and extracted.
Here's a very ugly workaround that decreases the archived file time by one hour to make the original source file and extracted file timestamps consistent:
If($File.LastWriteTime.IsDaylightSavingTime() -and $ArchivedFile){#HACK: fix for buggy date - adds an hour inside archive when the zipped file was created during PDT (files created during PST are not affected).
$entry = $WriteArchive.GetEntry($RelativePath)
$entry.LastWriteTime = ($File.LastWriteTime.ToLocalTime() - (New-TimeSpan -Hours 1))
}
There's probably a better way to handle this. Unfortunately I'm not aware of any way to store a Daylight Savings indicator for a file in a .Zip archive, and that information is lost.

Is rsync really any faster on files that have changed?

Why can't I trust rsync to be minimally as fast as cp? (I'm ignoring negligible differences for overhead.)
It seems to me like rsync is fairly slow on files with no content difference, but a changed timestamp.
If I make a file: cp -a testfile-100M destfile
And then I rsync them, I get what you would expect:
$ rsync -av testfile-100M destfile
sending incremental file list
sent 56 bytes received 12 bytes 8.00 bytes/sec
total size is 104857600 speedup is 1542023.53
But that's just because rsync is checking the size and the timestamp and skipping the file. What if I just change the timestamp?
$ touch testfile-100M
$ rsync -av testfile-100M destfile sending incremental file list
testfile-100M
sent 104870495 bytes received 31 bytes 113804.15 bytes/sec
total size is 104857600 speedup is 1.00
Also note that even though the speedup is 1, the inital copy took about 1/4 the time to complete than the final rsync, even though the contents are exactly the same. So what's going on here? Is it just all the overhead of doing the comparisons?
If that's the case, then when does rsync ever provide a performance advantage? Only when files are exactly the same on both sides?
For local files, if the size or mtime have changed, rsync by default just copies the whole thing without using its delta algorithm. You can turn this off with the --no-whole-file option, but for local copies this will typically be slower.
For the specific case of touching a file without changing it:
If you give the --size-only option, it will assume that files that have the same size are unchanged.
If you give the --checksum option, it will first hash the file to see if anything has changed, before copying it.
When the source and destination are both locally mounted filesystems rsync just copies the file(s) if the timestamps or sizes don't match. Rsync wins where you have large files with small differences and they are on machines separated by a low bandwidth link.
EDIT: Since someone felt the need to downvote this ancient answer... As to why rsync on local files might be slower than cp, there does not seem to be any good reason.
It appears the answer is that rsync does some extra steps in order to keep files in a consistent state, and not in a "partial-transferred" state while operating. Using the --inplace option removes this overhead.
Interestingly, for me rsync is about 4× faster than cp for copying to an external USB drive.

Resources