Pipe filelist into parallel rsync - terminal

Answers to this question do not seem to work for me to parallelize rsync from my local machine to an AWS instance.
I have done a dry run and obtained a filelist with filepaths for each file to be transferred, split this filelist into 59 smaller filelists residing in the split directory. I then created a new text file with the name of each filelist, split_list.txt. I'm trying to open up this filelist of filelists to pipe into 59 parallel rsync runs:
cat split_list.txt | parallel --will-cite -j 59 rsync --relative --human-readable --ignore-existing --files-from={} -rave "ssh -i key.pem" /Volumes/BRIENNAKH\ 4/xml/ [REMOTE-HOST]:~/arxiv/xml
I thought that by doing cat split_list.txt I would be passing to the second argument the path of each filelist in the split directory, and that the --files-from={} argument would be receiving each filelist.
It doesn't run and instead returns
Copyright (C) 1996-2006 by Andrew Tridgell, Wayne Davison, and others.
<http://rsync.samba.org/>
Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles,
inplace, IPv6, 64-bit system inums, 64-bit internal inums
rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.
rsync is a file transfer program capable of efficient remote update
via a fast differencing algorithm.
Usage: rsync [OPTION]... SRC [SRC]... DEST
or rsync [OPTION]... SRC [SRC]... [USER#]HOST:DEST
or rsync [OPTION]... SRC [SRC]... [USER#]HOST::DEST
or rsync [OPTION]... SRC [SRC]... rsync://[USER#]HOST[:PORT]/DEST
or rsync [OPTION]... [USER#]HOST:SRC [DEST]
or rsync [OPTION]... [USER#]HOST::SRC [DEST]
or rsync [OPTION]... rsync://[USER#]HOST[:PORT]/SRC [DEST]
The ':' usages connect via remote shell, while '::' & 'rsync://' usages connect
to an rsync daemon, and require SRC or DEST to start with a module name.
Options
-v, --verbose increase verbosity
-q, --quiet suppress non-error messages
--no-motd suppress daemon-mode MOTD (see manpage caveat)
-c, --checksum skip based on checksum, not mod-time & size
-a, --archive archive mode; same as -rlptgoD (no -H)
--no-OPTION turn off an implied OPTION (e.g. --no-D)
-r, --recursive recurse into directories
-R, --relative use relative path names
--no-implied-dirs don't send implied dirs with --relative
-b, --backup make backups (see --suffix & --backup-dir)
--backup-dir=DIR make backups into hierarchy based in DIR
--suffix=SUFFIX set backup suffix (default ~ w/o --backup-dir)
-u, --update skip files that are newer on the receiver
--inplace update destination files in-place (SEE MAN PAGE)
--append append data onto shorter files
-d, --dirs transfer directories without recursing
-l, --links copy symlinks as symlinks
-L, --copy-links transform symlink into referent file/dir
--copy-unsafe-links only "unsafe" symlinks are transformed
--safe-links ignore symlinks that point outside the source tree
-k, --copy-dirlinks transform symlink to a dir into referent dir
-K, --keep-dirlinks treat symlinked dir on receiver as dir
-H, --hard-links preserve hard links
-p, --perms preserve permissions
--executability preserve the file's executability
--chmod=CHMOD affect file and/or directory permissions
-o, --owner preserve owner (super-user only)
-g, --group preserve group
--devices preserve device files (super-user only)
--specials preserve special files
-D same as --devices --specials
-t, --times preserve times
-O, --omit-dir-times omit directories when preserving times
--super receiver attempts super-user activities
-S, --sparse handle sparse files efficiently
-n, --dry-run show what would have been transferred
-W, --whole-file copy files whole (without rsync algorithm)
-x, --one-file-system don't cross filesystem boundaries
-B, --block-size=SIZE force a fixed checksum block-size
-e, --rsh=COMMAND specify the remote shell to use
--rsync-path=PROGRAM specify the rsync to run on the remote machine
--existing skip creating new files on receiver
--ignore-existing skip updating files that already exist on receiver
--remove-source-files sender removes synchronized files (non-dirs)
--del an alias for --delete-during
--delete delete extraneous files from destination dirs
--delete-before receiver deletes before transfer (default)
--delete-during receiver deletes during transfer, not before
--delete-after receiver deletes after transfer, not before
--delete-excluded also delete excluded files from destination dirs
--ignore-errors delete even if there are I/O errors
--force force deletion of directories even if not empty
--max-delete=NUM don't delete more than NUM files
--max-size=SIZE don't transfer any file larger than SIZE
--min-size=SIZE don't transfer any file smaller than SIZE
--partial keep partially transferred files
--partial-dir=DIR put a partially transferred file into DIR
--delay-updates put all updated files into place at transfer's end
-m, --prune-empty-dirs prune empty directory chains from the file-list
--numeric-ids don't map uid/gid values by user/group name
--timeout=TIME set I/O timeout in seconds
-I, --ignore-times don't skip files that match in size and mod-time
--size-only skip files that match in size
--modify-window=NUM compare mod-times with reduced accuracy
-T, --temp-dir=DIR create temporary files in directory DIR
-y, --fuzzy find similar file for basis if no dest file
--compare-dest=DIR also compare destination files relative to DIR
--copy-dest=DIR ... and include copies of unchanged files
--link-dest=DIR hardlink to files in DIR when unchanged
-z, --compress compress file data during the transfer
--compress-level=NUM explicitly set compression level
-C, --cvs-exclude auto-ignore files the same way CVS does
-f, --filter=RULE add a file-filtering RULE
-F same as --filter='dir-merge /.rsync-filter'
repeated: --filter='- .rsync-filter'
--exclude=PATTERN exclude files matching PATTERN
--exclude-from=FILE read exclude patterns from FILE
--include=PATTERN don't exclude files matching PATTERN
--include-from=FILE read include patterns from FILE
--files-from=FILE read list of source-file names from FILE
-0, --from0 all *-from/filter files are delimited by 0s
--address=ADDRESS bind address for outgoing socket to daemon
--port=PORT specify double-colon alternate port number
--sockopts=OPTIONS specify custom TCP options
--blocking-io use blocking I/O for the remote shell
--stats give some file-transfer stats
-8, --8-bit-output leave high-bit chars unescaped in output
-h, --human-readable output numbers in a human-readable format
--progress show progress during transfer
-P same as --partial --progress
-i, --itemize-changes output a change-summary for all updates
--out-format=FORMAT output updates using the specified FORMAT
--log-file=FILE log what we're doing to the specified FILE
--log-file-format=FMT log updates using the specified FMT
--password-file=FILE read password from FILE
--list-only list the files instead of copying them
--bwlimit=KBPS limit I/O bandwidth; KBytes per second
--write-batch=FILE write a batched update to FILE
--only-write-batch=FILE like --write-batch but w/o updating destination
--read-batch=FILE read a batched update from FILE
--protocol=NUM force an older protocol version to be used
-E, --extended-attributes copy extended attributes
--cache disable fcntl(F_NOCACHE)
-4, --ipv4 prefer IPv4
-6, --ipv6 prefer IPv6
--version print version number
(-h) --help show this help (-h works with no other options)
Use "rsync --daemon --help" to see the daemon-mode command-line options.
Please see the rsync(1) and rsyncd.conf(5) man pages for full documentation.
See http://rsync.samba.org/ for updates, bug reports, and answers
rsync error: syntax or usage error (code 1) at /BuildRoot/Library/Caches/com.apple.xbs/Sources/rsync/rsync-51/rsync/options.c(1436) [client=2.6.9]
rsync version 2.6.9 protocol version 29
How should I be formatting my command so that I can do what I intended to do?

I can confirm that I'm encountering the same "bug" when trying to do a very simple case of rsync with the "--files-from" switch. Any attempt to use it results in a dump of the help message, although it doesn't complain about being an invalid option. Research on the net suggests that there is or was a bug in the "--relative" option , which is implied by "--files-from".
My work around has been to replace "--files-from=list-file" with "cat list-file" (using back quotes) in the command line.
I'm using LUbunto 18.04LTS and the rsync version is version 3.1.2 protocol version 31 from rsync.samba.org. That's what's packaged with Lubuntu (Ubuntu).
Who knows what's really going on?
[UPDATE] Well, after some further digging I discovered that the --files-from=list.txt must be used in conjuction with the source directory. I naively thought specifying the list would be relative to the current directory. It is not. It is relative to the source directory.
So the following should work:
% rsync -a -r -v --files-from=/someSrc/list.txt /someSrc [remote:]/someDest
and it does in simple cases. Test your complex case by testing a simple subset first.

Related

Rsync file automatically creating directory

I'm a beginner with rsync.
I have a file "filelist.txt" with some files with full path :
/tmp/folder1/file.txt
/tmp/folder2/file.txt
/tmp/folder2/file.txt
I want to copy this files from the server A to serveur B, and create directories if it's needed.
This file can evolve several time a day, so I don't want to handle dir creation manually on the other server before transfert files.
So I used :
cat filelist.txt | xargs -I {} rsync -r {} admin#riw-appcmd.i-wel.fr:{}
But I have for each line :
rsync: change_dir#3 "/tmp/folder1" failed: No such file or directory (2)
rsync error: errors selecting input/output files, dirs (code 3) at main.c(632) [receiver=3.0.3]
rsync: connection unexpectedly closed (8 bytes received so far) [sender]
What I'm doing wrong ?
You can use the --relative option. For example:
rsync --files-from filelist.txt -R -av / user#host:/
According to rsync manual:
-R, --relative
Use relative paths. This means that the full path names specified on the command line are sent to the server rather than just the last parts of the filenames. This is particularly useful when you want to send several different directories at the same time. For example, if you used this command:
rsync -av /foo/bar/baz.c remote:/tmp/
... this would create a file named baz.c in /tmp/ on the remote machine. If instead you used
rsync -avR /foo/bar/baz.c remote:/tmp/
then a file named /tmp/foo/bar/baz.c would be created on the remote machine, preserving its full path. These extra path elements are called "implied directories" (i.e. the "foo" and the "foo/bar" directories in the above example).
[...]

Unwanted names in path while rsyncing to NFS-Share

I am using a NAS to backup my file server. The NAS exports /share/Backup via NFS, which is mounted on the fileserver as /mount/qnap. I want to keep track which files are rsynced but exclude the Backup-Dir, which contains many small files.Therefore I am running two instances of rsync, one with -v and another one without. The following command works as it should, after executing it the directory structure on /mount/qnap is identical to /mount/btrfs-raid.
rsync --delete -av --exclude Backup /mnt/btrfs-raid/ /mnt/qnap/
Rsyncing the Backup folder with the command
rsync --delete -av /mnt/btrfs-raid/Backup /mnt/qnap/Backup
produces the following directory structure on the NAS:
/mnt/qnap/Backup/Backup/..Subdirectories
To get the result I want I have to delete the last "Backup" from the target directory path:
rsync --delete -av /mnt/btrfs-raid/Backup /mnt/qnap/
Why does the second example not work like the first one?
Thanks
Stefan
Trailing slashes in paths are important for rsync. See the documentation.
rsync -avz foo:src/bar /data/tmp
This would recursively transfer all files from the directory src/bar on the machine foo into the /data/tmp/bar directory on the local machine. The files are transferred in "archive" mode, which ensures that symbolic links, devices, attributes, permissions, ownerships, etc. are preserved in the transfer. Additionally, compression will be used to reduce the size of data portions of the transfer.
rsync -avz foo:src/bar/ /data/tmp
A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning "copy the contents of this directory" as opposed to "copy the directory by name", but in both cases the attributes of the containing directory are transferred to the containing directory on the destination. In other words, each of the following commands copies the files in the same way, including their setting of the attributes of /dest/foo:

Dealing with spaces in rsync path arguments

So I'm currently trying to write a few small scripts that allow me to manage my iTunes library of which I have clones on multiple OS X machines.
The basic idea is that I have a NAS holding a copy of the library that is used as an intermediate "master copy" since the machines holding the actually used copies aren't available all the time. If I want to update my old copy on machine B with the newer version from machine A, I'd then update the NAS copy based on machine A's current state, then update machine B from the updated NAS copy possibly at a later time.
The script files are located on the NAS within the same folder that also houses the iTunes directory. Since I'm mounting the NAS as a volume via AFP, I simply open a Finder window with the directory containing the scripts and drag'n'drop the script I want to use to a Terminal window for easy execution.
This is my attempt at the "update NAS from local copy" script:
rsync -avz --compress-level 1 --exclude 'Mobile Applications/*.ipa' --delete --delay-updates -n "$(echo $HOME | sed 's/ /\\ /g')/Music/iTunes" "$(dirname $0 | sed 's/ /\\ /g')"
(-n option of course only for testing the script)
Since there will be spaces in the paths I supply rsync with, I already figured out that I'd need to escape those somehow. I also know that the standard way to do that on OS X is to prepend all the spaces with a backslash, at least when manually typing paths in Terminal. But the code above still won't work – rsync complains that it cannot change to the directory I supplied, although the path it spits out in the error message seems to be perfectly fine and can be cd'd to, if you remove the double quotes around it first:
[...]
building file list ... rsync: change_dir "/Volumes/Macintosh\ HD/Users/Julian/Music" failed: No such file or directory (2)
done
[...]
If I remove the surrounding double quotes in the script itself, rsync seems to not honor the escaping backslashes at all and still treat the space following the backslash as a path separator:
[...]
building file list ... rsync: link_stat "/Volumes/Macintosh\" failed: No such file or directory (2)
rsync: change_dir "/Volumes/Macintosh HD/Users/Julian/HD/Users/Julian/Music" failed: No such file or directory (2)
done
[...]
And no, I can't work around the issue by shortening /Volumes/Macintosh\ HD/Users/Julian/Music to /Users/Julian/Music since this machine has multiple HDDs and / is not the same disk/partition as /Volumes/Macintosh\ HD. So I need to find a solution for this specific problem.
I'm seriously lost now.
Can anyone please explain to me what I need to change in order to have rsync recognize the paths correctly?
After messing around quite a bit more and finding this question, I managed to develop a working solution:
localpath=$HOME/Music/iTunes
remotepath=$(dirname $0)/
rsync -avz \
--compress-level 1 \
--exclude 'Mobile Applications/*.ipa' \
--delete \
--delay-updates \
-n \
"$localpath" \
"$remotepath"

Is there a way to move files from one set of directories to another set of corresponding directories

I take delivery of files from multiple places as part of a publishing aggregation service. I need a way to move files that have been delivered to me from one location to another without losing the directory listings for sorting purposes.
Example:
Filepath of delivery: Server/Vendor/To_Company/Customer_Name/**
Filepath of processing: ~/Desktop/MM-DD-YYYY/Returned_Files/Customer_Name/**
I know I can move all of the directories by doing something such as:
find Server/Vendor/To_Company/* -exec mv -n ~/Desktop/MM-DD-YYYY/Returned_Files \;
but using that I can only run the script one time per day and there are times when I might need to run it multiple times.
It seems like ideally I should be able to create a copycat directory in my daily processing folder and then move the files from one to the other.
you can use rsync command with --remove-source-files option. you can run it as many times as needed.
#for trial run, without making any actual transfer.
rsync --dry-run -rv --remove-source-files Server/Vendor/To_Company/ ~/Desktop/MM-DD-YYYY/Returned_Files/
#command
rsync -rv --remove-source-files Server/Vendor/To_Company/ ~/Desktop/MM-DD-YYYY/Returned_Files/
reference:
http://www.cyberciti.biz/faq/linux-unix-bsd-appleosx-rsync-delete-file-after-transfer/
You could use rsync to do this for you:
rsync -a --remove-source-files /Server/Vendor/To_Company/Customer_Name ~/Desktop/$(date +"%y-%m-%d")/Returned_files/
Add -n to do a dry run to make sure it does what you want.
From the manual page:
--remove-source-files
This tells rsync to remove from the sending side the files (meaning non-directories) that are a part of the
transfer and have been successfully duplicated on the receiving side.
Note that you should only use this option on source files that are quiescent. If you are using this to move
files that show up in a particular directory over to another host, make sure that the finished files get renamed
into the source directory, not directly written into it, so that rsync can’t possibly transfer a file that is
not yet fully written. If you can’t first write the files into a different directory, you should use a naming
idiom that lets rsync avoid transferring files that are not yet finished (e.g. name the file "foo.new" when it
is written, rename it to "foo" when it is done, and then use the option --exclude='*.new' for the rsync trans‐
fer).

Rsync does not properly set permissions on Windows folder

I'm using rsync on Windows 7 (in particular, cwrsync). I'm using a simple command as such:
rsync -r --perms --delete /cygdrive/c/Users/Michael/Documents/Personal/ /cygdrive/c/Users/Michael/Documents/Personal_Backup/
The recursive copy works fine, except if I was to (right-click/Properties/Security tab) on any folder created by rsync on the destination; I get the following pop-up message:
The permissions on {folderName} are incorrectly ordered, which may cause
some entries to be ineffective.
I also tried the --acls option but get the following error:
recv_acl_access: value out of range: ff rsync error: error in rsync
protocol data stream (code 12) at acls.c(690) [Receiver=3.0. rsync:
connection unexpectedly closed (9 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at
io.c(610) [sender=3.0.8]
In any case, I just want to use rsync correctly so that viewing the Security permissions in Windows won't throw an error.
Michael,
This solution suggests that you should not be using --perms, but using --chmod=ugo=rwX instead.
Good luck!
Dotan
One word, Robocopy.
I had exactly the same isues with borked permissions while using cwRsync, tried numerous things but none seemed to work so I gave up eventually.
This is default Windows tool and has similar (for your purpose, the same) feature set.
I discovered it last night and ditched rsync completely. It's built for unix-like's so some sort of bummer is expected on Windows.
This got me started:
http://www.sevenforums.com/tutorials/187346-robocopy-create-backup-script.html
Here's the little backup script I made for myself to mirror my partitions to external drive.
Don't look back for rsync any more.
I use the now-deprecated cacls to add myself back in after the copy occurs.
rsync -avASPC sourceDir/* destDir
cacls destDir /t /e /r doej
cacls destDir /t /e /g doej:f
Where sourceDir is the source directory and destDir is the destination directory and doej is the username. It would probably be better to use icacls, but I haven't learned it yet.
I also tried robocopy, but I did not have the permissions I needed to make that work, it seems.
Flags used for rsync
-a, --archive archive mode; equals -rlptgoD (no -H,-A,-X)
-r, --recursive recurse into directories
-l, --links copy symlinks as symlinks
-p, --perms preserve permissions
-t, --times preserve modification times
-g, --group preserve group
-o, --owner preserve owner (super-user only)
-D same as --devices --specials
--devices preserve device files (super-user only)
--specials preserve special files
-v, --verbose increase verbosity
-S, --sparse handle sparse files efficiently
-A, --acls preserve ACLs (implies -p, which is also implied by -a)
-P same as --partial --progress
--progress show progress during transfer
--partial keep partially transferred files
-C, --cvs-exclude auto-ignore files in the same way CVS does
Flags used from CACLS
/T Changes ACLs of specified files in
/E Edit ACL instead of replacing it.
/R user Revoke specified user's access rights (only valid with /E).
/P user:perm Replace specified user's access rights.
Perm can be: ...
F Full control

Resources