Unable to exclude directories from s3cmd sync - s3cmd

I'm having an issue excluding directories on a nightly backup script using S3cmd.
I'm trying to exclude certain files from being backed up (log files, etc...)
My server structure is:
/srv/users/USERNAME/
/srv/users/USERNAME2/
etc...
So I'm running an s3cmd on cron, similar to:
s3cmd sync --config=/path/to/config/.s3cfg --delete-removed --exclude-from=/path/to/exclude/backups.exclude /srv/ s3://aws-bucket/
where my backups.exclude file contains:
/srv/users/*/log
/srv/users/*/run
As well as some other similar directories. As should be obvious, I'd like to catch all user directories with that wildcard to exclude them from backup.
However, it doesn't seem to be working. I'm currently running s3cmd version 2 (I upgraded to see if maybe it was a bug).
Thanks!

Ok I figured this out, it was related to the paths. For the sync source, I used /srv (no slash) instead of /srv/ and also updated the excludes file to remove the starting slash, srv/users/*/log/ and added a trailing slash.

Related

s3cmd sync is remote copying the wrong files to the wrong locations

I've got the following as part of a shell script to copy site files up to a S3 CDN:
for i in "${S3_ASSET_FOLDERS[#]}"; do
s3cmd sync -c /path/to/.s3cfg --recursive --acl-public --no-check-md5 --guess-mime-type --verbose --exclude-from=sync_ignore.txt /path/to/local/${i} s3://my.cdn/path/to/remote/${i}
done
Say S3_ASSET_FOLDERS is:
("one/" "two/")
and say both of those folders contain a file called... "script.js"
and say I've made a change to two/script.js - but not touched one/script.js
running the above command will firstly copy the file from /one/ to the correct location, although I've no idea why it thinks it needs to:
INFO: Sending file
'/path/to/local/one/script.js', please wait...
File
'/path/to/local/one/script.js'
stored as
's3://my.cdn/path/to/remote/one/script.js' (13551
bytes in 0.1 seconds, 168.22 kB/s) [1 of 0]
... and then a remote copy operation for the second folder:
remote copy: two/script.js -> script.js
What's it doing? Why?? Those files aren't even similar. Different modified times, different checksums. No relation.
And I end up with an s3 bucket with two incorrect files in. The file in /two/ that should have been updated, hasn't. And the file in /one/ that shouldn't have changed is now overwritten with the contents of /two/script.js
Clearly I'm doing something bizarrely stupid because I don't see anyone else having the same issue. But I've no idea what??
First of all, try to run it without --no-check-md5 option.
Second, I suggest you to pay attention to directory names, specifically trailing slashes.
s3cmd documentation says:
With directories there is one thing to watch out for – you can either upload the directory and its contents or just the contents. It all depends on how you specify the source.
To upload a directory and keep its name on the remote side specify the source without the trailing slash
On the other hand to upload just the contents, specify the directory it with a trailing slash

rsync with folder and file name pattern matching to copy files

Right now I'm successfully running:
rsync -uvma --include="*/" --include="*.css" --exclude="*" $spec_dir $css_spec_dir
In a shell script which copies all of the files in the source directory, that are .css files, into a target directory.
I want to do the same for HTML files, but only where they are in a subfolder with the name 'template'.
So I'm in directory ~/foo, and I want to rsync where the --include="*/" only matches on subfolders with the name 'template'. So ~/foo/bar/template/baz/somefile.html would match, and so would ~foo/bar/baz/qux/template/someotherfile.html, but NOT ~/foo/bar/thirdfile.html
Although it looks a little bit strange, this works for me:
rsync -uvma --include="*/" --include="*/template/*/*.html" --include="*/template/*.html" --include="template/*.html" --include="template/*/*.html" --exclude="*" $spec_dir $html_spec_dir
This one works for me:
rsync -umva --include="**/templates/**/*.html" --exclude="*.html" source/ target
Were you looking for **? Here you have to be careful about choosing your exclude pattern, * won't work as it matches directories on the way. If rsync finds foo/templates/some.html, it will first copy foo, then foo/templates and then foo/templates/some.html, but before it gets there * already matched foo and nothing gets copied.
Here's what worked:
rsync -uvma --include="*/" --include="templates/**.html" --exclude="*" $html_all_dir $html_dir
My guess is, your format and mine probably accomplish the same thing. I know I tried about 20 different patterns before this one, and this is the only one that worked properly. I don't think I tried your format though :)

How to exclude particular files from being copied by rsync?

I am reading rsync docs, INCLUDE/EXCLUDE PATTERN RULES section. Following the rules explained there I would like to exclude the following folders and files:
all .metadata folders
all *.DS_Store* files
So, I am creating rules like:
- .DS_Store
.metadata/
But files and folders are not excluded. What am I doing incorrectly?
The following will skip everything in .DS_Store directories plus the .DS_Store directories themselves and works with rsync distributed with Mavericks: rsync --exclude='.DS_Store' --exclude='.metadata' <your_source_dir> <your_destination_dir>.
The --exclude=<pattern> is actually just a shorthard for --filter='- <pattern>'. This means --exclude='.DS_Store' and --filter='- .DS_Store' are equivalent. The same goes with --include=<pattern> which is actually just a shorthand for --filter='+ <pattern>'.

How to backup with s3cmd, ignoring multiple directories and file types

I've been trying to figure out how to backup the contents of my file server's (CentOS via smb) user's folder, ignoring certain file types and directories. It seems like this should be easy, but I'm not getting anywhere on figuring out how to ignore multiple directories.
I'd like to ignore the following:
all files and directories starting with a . or a _
all MS Office temp files (eg ~$*)
lock files (eg .lock)
I've tried a bunch of different combinations of the --exclude flag, but can't get any to work right.
This is the command that makes the most sense, but it's not excluding anything:
s3cmd sync --dry-run --verbose --delete-removed --exclude '.*' '_*' '~$*' '*.lock' /home/user-folder s3://bucket-name/
If you are already using .gitignore, you can do something like
s3cmd sync --exclude '.git/*' --exclude-from .gitignore <local_dir> s3://<bucket>/
as stated in this blog post and confirmed by the documentation for --exclude-from from the official docs (Ctrl+F and search for "exclude-from").
It works great, with one minor drawback: if you're excluding a folder within .gitignore, you must exclude its contents also, or s3cmd will grab its contents. However, this is easy, you can just add a line like <foldername>/* inside the .gitignore and everything will be ok.
EDIT:
Well, better than this. Set up a .s3ignore file and just refer to it from the sync command:
s3cmd sync --exclude-from .s3ignore <local_dir> s3://<bucket>/
.s3ignore example:
.git
.git/*
.gitignore
node_modules
node_modules/*
*.swo
*.swp
*.pyo
*.pyc
I've do something similar. The key is to use --exclude before each pattern you want to match:
s3cmd -v --recursive --exclude ".ts" --exclude ".aac" --exclude "/thumbnails" put /var/www/folder s3://bucket/
Also I managed to use .ts without the wildcard symbol and it worked in my case!
Other answers mention passing --exclude <pattern> for each pattern, and packing all patterns into a file to pass with --exclude-from <file>
Using regex:
You can also pack all patterns into a regular expression and pass it with the --rexclude option:
The Regex pattern for the question above: ".^\.*|._*|.~$*|.*.lock"
s3cmd sync --dry-run --verbose --delete-removed --rexclude ".^\.*|._*|.~$*|.*.lock" /home/user-folder s3://bucket-name/

Excluding folders in Winrar

A Day with Winrar
All I wanted to do was exclude folders and their contents using wildcards, and even after reading the docs, it turned into a guessing game...
So my test bed looks like:
C:\!tmp1\f1
C:\!tmp1\f1\f1.txt
C:\!tmp1\f1\a
C:\!tmp1\f1\a\a.txt
C:\!tmp1\f2
C:\!tmp1\f2\f2.txt
C:\!tmp1\f2\a
C:\!tmp1\f2\a\a.txt
And I am executing:
C:\>"c:\program files\winrar\winrar.exe" a -r !tmp1.rar !tmp1
which gives me a rar with !tmp1 as the root (sole top level folder).
The exclude switch is -x<filepathpattern> and may be included multiple times.
So, given that we want to exclude f2, and all its subcontents...
-x*\f2\*
removes the contents, but leaves f2
-xf2
does nothing - includes all
-x\f2
does nothing - includes all
-x*\f2
does nothing - includes all (now I'm mad), so surely it must be..
-x\f2\
nope, does nothing - includes all. So it has GOT to be...
-x*\f2\
hell no, does nothing - includes all. and I already know that
-x*\f2\*
removes the contents, but leaves f2. Onward we go...
-x*f2\
does nothing - includes all. Grrrr. Aha! how about...
-x!tmp1\f2\
nope, does nothing - includes all. WTF. Alright, So it has GOT to be...
-x!tmp1\f2
Holy moly, it worked! Hmmm, then how come....
-x*\f2
does not work? This was the little demon that sent me down this crazed path to begin with and should have worked!
Given all that, do I dare try to go after */a/* directories, removing contents and the dirs?
-x*\a
does not work, of course, does nothing.
-x*\*\a
does not work, of course, does nothing.
-x!tmp1\*\a
nope. But...
-x*\a\*
removes contents of both dirs, but leaves the folders. So, in desperation I can use the -ed switch which will not store empty folders, but this is a broad hack, I want to eliminate the folders specified not all empty folders.
With my animosity growing toward winrar, I am passing the baton of information forward with an eye to that glorious day when we will know how to specifically exclude a folder and its contents using wildcards and not using the -ed switch.
(Quite old question but still may be relevant)
Maybe what you simply needed was this :
-x*\f2 -x*\f2\*
two exclude switches, should remove directory f2 and all its contents.
An even older question by now, but came across this question so I reproduced your folder structure and, at least nowadays (Winrar 5.11, not the latest but quite new), this works:
-x*\f2
So the whole command line is:
"C:\Program Files\WinRAR\Rar.exe" a -m5 -s !tmp1.rar !tmp1 -x*\f2
And this is what is stored in the .rar file:
!tmp1\f1\a\a.txt
!tmp1\f1\f1.txt
!tmp1\f1\a
!tmp1\f1
!tmp1
Similarly, if you use -x*\a, all a folders are excluded, storing this:
!tmp1\f1\f1.txt
!tmp1\f2\f2.txt
!tmp1\f1
!tmp1\f2
!tmp1
Finally, combining both parameters (-x*\f2 -x*\a), you get this:
!tmp1\f1\f1.txt
!tmp1\f1
!tmp1
To manage large list of files to be excluded, you can create text fie and write all excluded files/folders relative to the source folder:
1) create file list.txt, write the name of excluded files/folders
note: * refer to the source, all files/folders are relative to the source folder
*\f2
*\f3
2) Run the command
rar a -r -x#list.txt target.rar source-folder

Resources