How to copy a directory structure but only include certain files - bash

I found a solution for my question in Windows but I'm using Ubuntu: How to copy a directory structure but only include certain files using Windows batch files?
As the title says, how can I recursively copy a directory structure but only include some files? For example, given the following directory structure:
folder1
folder2
folder3
data.zip
info.txt
abc.xyz
folder4
folder5
data.zip
somefile.exe
someotherfile.dll
The files data.zip and info.txt can appear everywhere in the directory structure. How can I copy the full directory structure, but only include files named data.zip and info.txt (all other files should be ignored)?
The resulting directory structure should look like this:
copy_of_folder1
folder2
folder3
data.zip
info.txt
folder4
folder5
data.zip
Could you tell me a solution for Ubuntu?

$ rsync --recursive --include="data.zip" --include="*.txt" --filter="-! */" dir_1 copy_of_dir_1
To exclude dir3 regardless of where it is in the tree (even if it contains files that would match the --includes):
--exclude 'dir3/' (before `--filter`)
To exclude dir3 only at at specific location in the tree, specify an absolute path, starting from your source dir:
--exclude '/dir1/dir2/dir3/' (before `--filter`)
To exclude dir3 only when it's in dir2, but regardless of where dir2 is:
--exclude 'dir2/dir3/' (before `--filter`)
Wildcards can also be used in the path elements where * means a directory with any name and ** means multiple nested directories.
To specify only files and dirs to include, run two rsyncs, one for the files and one for the dirs. The problem with getting it done in a single rsync is that when you don't include a dir, rsync won't enter the dir and so won't discover any files in that branch that may be matching your include filter. So, you start by copying the files you want while not creating any dirs that would be empty. Then copy any dirs that you want.
$ rsync --recursive --prune-empty-dirs --include="*.txt" --filter="-! */" dir_1 copy_of_dir_1
$ rsync --recursive --include '/dir1/dir2/' --include '/dir3/dir4/' --filter="-! */" dir_1 copy_of_dir_1
You can combine these if you don't mind that your specified dirs don't get copied if they're empty:
$ rsync --recursive --prune-empty-dirs --include="*.txt" --include '/dir1/dir2/' --include '/dir3/dir4/' --filter="-! */" dir_1 copy_of_dir_1
The --filter="-! */" is necessary because rsync includes all files and folders that match none of the filters (imagine it as an invisible --include filter at the end of the list of filters). rsync checks each item to be copied against the list of filters and includes or excludes the item depending on the first match it finds. If there's no match, it hits that invisible --include and goes on to include the item. We wanted to change this default to --exclude, so we added an exclude filter (the - in -! */), then we negate the match (!) and match all dirs (*/). Since this is a negated match, the result is that we allow rsync to enter all the directories (which, as I mentioned earlier, allows rsync to find the files we want).
We use --filter instead of --exclude for the final filter because --exclude does not allow specifying negated matches with the ! operator.

I don't have a beautiful one liner, but since nobody else has answered you can always:
find . -name 'file_name.extension' -print | cpio -pavd /path/to/receiving/folder
For each specific file after copying the directories.
(Make sure you're in the original folder first, of course! :) )

Here is a one-liner using rsync:
rsync -a -f"+ info.txt" -f"+ data.zip" -f'-! */' folder1/ copy_of_folder1/
If you already have a file list, and want a more scalable solution
cat file.list | xargs -i rsync -a -f"+ {}" -f'-! */' folder1/ copy_of_folder1/

cp -pr folder1 copy_of_folder1; find copy_of_folder1 -type f ! \( -name data.zip -o -name info.txt \) -exec rm -f {} \;
first time : copy entirely folder1 to copy_of_folder1
second time : erase all files differents from data.zip and
info.txt
At the end, you have your complete structure with only the file data.zip and info.txt

Related

Extract recursively and append extension?

I want to make a script that can extract rar files recursively and append an extension to the extracting files.
The extension should be added during the process (so that other software doesn't see a recognised file extension and start its process until all files are extracted). Once all files are completed the extension should be removed.
Here is an example file structure...
/some/path/
folder1/
folder2/
file1.rar
folder3/
file2.rar
file3.rar
folder4/
file4.rar
I want this to turn to this...
/some/path/
folder1/
folder2/
file1.rar
file1.txt.extracting
folder3/
file2.rar
file2.txt.extracting
file3.rar
file3.txt.extracting
folder4/
file4.rar
file4.txt.extracting
Then once all are complete, to this...
/some/path/
folder1/
folder2/
file1.rar
file1.txt
folder3/
file2.rar
file2.txt
file3.rar
file3.txt
folder4/
file4.rar
file4.txt
I hope that makes sense. Is this possible?
The following should work :
cd $(mktemp -d)
find /some/path/ -name '*.rar' \
-exec unrar e {} \; \
-exec bash -c 'mv * $(dirname {})' \;
# if relevant, cd - to get back to the previous directory
This will iterate over all the .rar files in /some/path and its children directories, extracting them in a temporary directory before copying the extracted content to the .rar file's directory.
Executing bash -c instead of mv directly is required for the subshell to be interpreted after the {} gets replaced by find.

Rsync only particular files and directory that contain those files

Supposed I have directory structure like
src/
src/a/
src/a/1.ocf
src/a/1.pdf
src/a/1.txt
src/b/
src/b/2.ocf
src/b/2.pdf
src/b/2.xls
src/c/
src/c/3.doc
src/c/3.ocf
src/c/3.txt
src/d/
Then, I just want to synchronize only files with extension *.txt. So, I tried to use command like:
#rsync -avvH --include="*/" --include="*.txt" --exclude="*" src/ dst/
sending incremental file list
./
a/
a/1.txt
b/
c/
c/3.txt
d/
Unfortunately, this command not only synchronize *.txt file but also all directory. I don't want directory 'b' and 'd' be synchronized because it not contain file *.txt
Is there simple way to do that?
The option you're looking for is -m to prune empty directories:
rsync -avvHm --include="*/" --include="*.txt" --exclude="*" src/ dst/

Copy all files in directory except ".txt" and not to replace existing files

i have to copy all the file from source directory to destination directory , but skip all file with extension ".txt" and not to the replace the file if its already present in destination directory
example
source directory
/a/aone.js
/a/atwo.js
/b/bone.txt
/b/btwo.js
destination directory
/a/atwo.js
then it should only copy
/a/aone.js
/b/btwo.js
and skip "/a/atwo.js" because its already present in destination folder
and skip "/b/bone.txt" because its extension is ".txt"
i tried this command but this does not work
find /path/to/source/ \( ! -name "*.txt" \) -type f | cp -n /path/to/destination/ -R
cp -n /path/to/source/*(!*.txt) /path/to/destination/ -R
Assuming you can use rsync, (vaz is verbose, archive and compress - I believe the other options are self explanatory)
rsync -vaz --exclude "*.txt" /path/to/source/ /path/to/destination/
Why make it difficult. You were on the right track. A simple:
cp -an /path/to/source/*.[^t*] /path/to/destination
will copy all files from source, except those whose extension begins with a t to destination. It will do so without overwriting existing files in destination. This presumes that files do not have more than one dot. If so, then a few more lines of code will be needed.
The following will illustrate use of the above:
$ md tmp
$ md a
$ md b
$ touch a/a.{j,k,l,txt}
$ ls -1 a
a.j
a.k
a.l
a.txt
$ cp -an a/a*.[^t*] b
$ ls -1 b
a.j
a.k
a.l
using cp, you must match the proper directory depth. If you have another intervening directory, then simply add an additional wildcard. For example:
$ ls -1 dat/*/*.[^t*]
dat/a/a.j
dat/a/a.k
dat/a/a.l
dat/b/a.j
dat/b/a.k
dat/b/a.l
If your directory structure gets more complex, then go with find or rsync. Both are excellent tools and rsync can handle both local and network transfers. cp is the right tool for small jobs, but when more flexibility is needed, then grab a bigger hammer.

replacing files in another directory

I have two directories structured as follows:
dir1/a/file1
dir1/a/b/file2
dir1/a/c/d/file3
and
dir2/a/file4
dir2/a/b/file5
dir2/a/c/d/file6
I want to copy all the files in the subdirectories under dir1 to dir2, but keep the files that are currently in dir2, in other words I want to resulting structure to look like:
dir2/a/file1
dir2/a/file4
dir2/a/b/file2
dir2/a/b/file5
dir2/a/c/d/file3
dir2/a/c/d/file6
Is there a simple way to do this using bash?
You could start with
cd dir1
cp -rpuv * ../dir2/
Before:
$ find dir2/
dir2/
dir2/a
dir2/a/file4
dir2/a/c
dir2/a/c/d
dir2/a/c/d/file6
dir2/a/b
dir2/a/b/file5
After:
$ find dir2/
dir2/
dir2/a
dir2/a/file1
dir2/a/file4
dir2/a/c
dir2/a/c/d
dir2/a/c/d/file3
dir2/a/c/d/file6
dir2/a/b
dir2/a/b/file2
dir2/a/b/file5
Note that -p preserves permissions, -v make copy verbose and -u only updates files (doing what the question suggests: keep the files already in dir2)

Making archive from files with same names in different directories

I have some files with same names but under different directories. For example, path1/filea, path1/fileb, path2/filea, path2/fileb,....
What is the best way to make the files into an archive? Under these directories, there are many other files under these directories that I don't want to make into the archive. Off the top of my head, I think of using Bash, probably ar, tar and other commands, but am not sure how exactly to do it.
Renaming the files seems to make the file names a little complicated. I tend to keep the directory structure inside the archive. Or I might be wrong. Other ideas are welcome!
Thanks and regards!
EDIT:
Examples would be really nice!
you can use tar with --exclude PATTERN option. See the man page for more.
To exclude files, you can see this page for examples.
You may give the find command multiple directories to search through.
# example: create archive of .tex files
find -x LaTeX-files1 LaTeX-files2 -name "*.tex" -print0 | tar --null --no-recursion -uf LaTeXfiles.tar --files-from -
To recursively copy only files with filename "filea" or "fileb" from /path/to/source to /path/to/archive, you could use:
rsync -avm --include='file[ab]' -f 'hide,! */' /path/to/source/ /path/to/archive/
'*/' is a pattern which matches 'any directory'
'! */' matches anything which is not a directory (i.e. a file)
'hide,! */' means hide all files
Filter rules are applied in order, and the first rule that matches is applied.
--include='file[ab]' has precedence, so if a file matches 'file[ab]', it is included.
Any other file gets excluded from the list of files to transfer.
Another alternative is to use the find...exec pattern:
mkdir /path/to/archive
cd /path/to/source
find . -type f -iname "file[ab]" -exec cp --parents '{}' /path/to/archive ";"
What I have used to make a tar ball for the files with same name in different directories is
$find <path> -name <filename> -exec tar -rvf data.tar '{}' \;
i.e. tar [-]r --append
Hope this helps.

Resources